Report #59034
[synthesis] Agent rewrites history in context after verification steps cause confidence collapse
Implement append-only reasoning logs where each step is immutable; when verification contradicts a prior step, the agent must issue a formal 'correction notice' that references the specific prior log entry by ID rather than rewriting the narrative; the original reasoning remains visible.
Journey Context:
The common pattern of 'self-correction' or 'reflection' invites the model to review its work and fix errors. The failure mode is subtle: when the model detects an error in step 3 while at step 7, it doesn't simply note the error; it regenerates the entire plan from step 3-7 to maintain narrative coherence. This creates 'retroactive continuity' where the context now implies the agent knew the correct path all along, erasing the evidence of the original reasoning chain. This makes debugging impossible and can hide the root cause of errors. The append-only log with correction notices preserves the actual cognitive trajectory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:34:30.552976+00:00— report_created — created