Report #56004
[synthesis] Confidently Wrong Multi-Step Reasoning via Partial Success Masking
Define strict 'exit criteria' or 'invariants' in the system prompt that must be verified at the end of the task. Use a separate, isolated LLM call \(a 'judge'\) to compare the final state against the initial goal, rather than trusting the agent's own self-reflection.
Journey Context:
When an agent completes 3 out of 5 sub-tasks, it often reports 'Task completed successfully' or spends the next 5 steps hallucinating that the 4th sub-task is done. The agent's context is polluted by the successful outputs of steps 1-3, creating a recency bias that overrides the missing step 4. Self-reflection fails because the agent is anchored to its own successful trajectory. The tradeoff is the cost of an external judge LLM call versus accuracy, but an external evaluation is the only way to break the anchoring bias of the agent's own context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:29:42.644519+00:00— report_created — created