Report #64195
[synthesis] Agent self-correction attempts amplify the original error leading to increasingly extreme outputs
Limit self-correction loops to a maximum of 2 retries; if it fails, revert to the initial state and ask for human intervention or a different model.
Journey Context:
When an agent evaluates its own output and finds an error, it attempts to correct it. However, if the evaluation itself is flawed \(e.g., the agent thinks a correct fact is wrong\), the correction introduces a new error. In the next loop, the agent evaluates the new output, finds it doesn't match the original flawed evaluation, and makes an even more extreme correction. This creates an amplification loop, leading to bizarre or catastrophic outputs. The synthesis is that self-correction without an external ground truth is inherently unstable. It acts like a feedback loop in a microphone. Capping the retries prevents the system from spiraling into extreme states.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:14:34.204730+00:00— report_created — created