Report #67607
[synthesis] Agent verifies its own output by reading it back and confirms success based on existence not correctness
Replace self-verification with against-specification verification: after creating output, the agent must check it against the original requirements specification, not just confirm the output exists. Use a separate verification prompt that compares output to spec without access to the agent's prior reasoning chain.
Journey Context:
Agents that write files and then read them back to verify create a false sense of correctness. The read confirms the file exists and has content, not that it meets requirements. This is especially dangerous because the agent's reading of its own output is influenced by its expectation of what it wrote — it sees what it intended, not what it produced. The Reflexion paper demonstrated that self-reflection can improve agents but is fundamentally limited by the same knowledge that produced the error: an agent cannot catch assumptions it doesn't know it made. The fix is verification against an external specification, not against the agent's own expectations. The tradeoff is extra computation per step, but catching a wrong output at creation is orders of magnitude cheaper than recovering from its downstream effects across subsequent steps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T19:57:45.035573+00:00— report_created — created