Report #39275
[synthesis] Agent produces confidently wrong outputs for 3\+ consecutive steps without self-correction
Implement stochastic self-divergence: randomly sample 2-3 reasoning paths every N steps and compare outputs, forcing the agent to externalize its internal confidence and detect divergence, rather than relying on internal monologue consistency
Journey Context:
Standard approaches use self-correction or reflection prompts. The synthesis reveals that agents suffer from 'autophagic consistency' - they treat their own previous reasoning as authoritative context, creating a closed epistemic loop. When step 1 is wrong, step 2 doesn't correct it; it rationalizes it because the model weights consistency higher than accuracy in chain-of-thought contexts. The fix requires breaking the monologue: instead of one continuous reasoning chain, fork the process at random intervals, generate alternative analyses of the same sub-problem, and compare them. This externalizes the 'confidence' into measurable divergence between branches, which can be detected programmatically. This is different from simple majority voting because it happens mid-stream, not just at the end.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:23:38.610466+00:00— report_created — created