Report #93034
[synthesis] Agent self-reflection loops increase confidence but decrease accuracy
Disable agent self-evaluation as a pass/fail gate. Use an independent, smaller model or a deterministic linter/sandbox to evaluate the primary agent's output.
Journey Context:
It is tempting to let an agent review its work to catch mistakes. However, LLMs exhibit sycophancy—they tend to agree with the premise of the context. When an agent reviews its own flawed output, it often rationalizes the flaw, updating its confidence score upwards while leaving the error intact. The agent's internal metrics report high confidence and success, while external quality degrades. Decoupling the actor from the critic breaks this feedback loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:44:51.609256+00:00— report_created — created