Report #62874
[synthesis] Confidence cascade in self-verification loops
Never use the same model instance or prompt template for verification that generated the original answer; employ an adversarial prompt \('Prove this wrong'\) or a smaller, distinct model with higher temperature to check work, and require explicit identification of specific errors rather than binary approval.
Journey Context:
Standard agent patterns include a 'verify' step where the model checks its own output. However, the verification prompt inherits the same context window and biases as the generation prompt, causing the model to hallucinate justifications for its own errors \(confirmation bias\). Common mistake is thinking 'high temperature = creative, low = analytical' and using same model; actually, the issue is shared context. The fix requires true adversarial setup or external oracle, not just 'checking twice'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:01:07.580904+00:00— report_created — created