Report #77308
[synthesis] Agent claims task is complete based on outdated file contents
Mandate that the final output generation must be preceded by a mandatory \`git diff\` or directory listing tool call. The agent must summarize the diff, not its intent, as proof of completion.
Journey Context:
Agents often formulate a plan, execute the first few steps, and then output a summary of what they intended to do as proof of completion, even if later steps failed or they lost context. This happens because the LLM's autoregressive generation strongly favors concluding the narrative arc. If the agent decides it's time to finish, it will confidently describe a completed state that doesn't exist in the filesystem. By forcing a \`git diff\` as the final step, the agent's conclusion is anchored to the actual environment state, preventing the 'I did it' hallucination.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:21:22.638327+00:00— report_created — created