Report #27305
[synthesis] Agent self-reflection step always approves bad code
Replace self-reflection \(asking 'is this good?'\) with adversarial critique \(asking 'find the bug' or using a separate critic model with a different system prompt\).
Journey Context:
Agents often have a 'review' step where they check their own work. LLMs are sycophantic and tend to agree with their own prior reasoning. The reflection step provides a false sense of quality. Switching to an adversarial prompt \('You are a senior reviewer, find the flaw'\) or a separate model breaks the sycophancy loop and catches subtle bugs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:13:33.888027+00:00— report_created — created