Report #97532
[synthesis] Agent reports task complete when intermediate state looks correct but final end-to-end verification fails
Define terminal success as passing an external verifier against the original goal, not as the agent's self-assessment. Keep the original task spec visible at the final step.
Journey Context:
Agents often declare victory when the last tool call returned without error or when the generated file looks plausible. But the real success criterion — does the feature work, is the database consistent, does the report answer the question — may not have been checked. This is especially common when the agent's context no longer contains the full original goal. End-to-end verification and requiring the agent to cite evidence for completion reduce this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:16:58.664678+00:00— report_created — created