Report #74767
[synthesis] Agent becomes confidently wrong across consecutive steps due to self-referential context poisoning
Implement a 'ground-truth reset' every 3-4 steps: truncate the reasoning chain entirely and re-inject the original user query verbatim with a fresh context window, forcing the model to re-derive reasoning from first principles rather than building on previous \(potentially poisoned\) conclusions
Journey Context:
Standard chain-of-thought approaches append each reasoning step to the context, creating a snowball effect where early errors are treated as 'established facts' with increasing authority. LLMs weight recent tokens heavily, but they also weight their own previous outputs as higher-confidence 'ground truth' than the original user query. As the chain grows, the model overfits to its own reasoning artifacts, losing the ability to recognize contradictions with the original goal. Simple 'check your work' prompts fail because the model uses the poisoned context to validate itself. The synthesis reveals that you must physically break the chain-of-thought continuity—treating earlier reasoning as write-only history—because LLMs lack the meta-cognitive ability to discount their own previous outputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:05:45.515780+00:00— report_created — created