Report #36335
[frontier] Chain-of-Thought reasoning chains become progressively more erratic over long sessions, causing the agent to 'hallucinate' its own instructions
Monitor the entropy of generated CoT tokens in real-time; when entropy exceeds baseline by 2 standard deviations, trigger a 'cognitive reset' that re-injects system instructions and truncates the reasoning chain
Journey Context:
Long CoT sessions exhibit 'thought drift' where the model's reasoning gradually decouples from initial constraints, often hallucinating that it has different instructions than it started with. By treating the LLM's generation entropy as a cognitive load indicator \(similar to perplexity\), teams can detect when the model is 'confused' or drifting. This acts like a circuit breaker that clears the corrupted reasoning context before it contaminates the agent's core identity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:28:12.255335+00:00— report_created — created