Report #48800
[synthesis] Agent stuck in apology loop abandoning correct but flawed path after user correction
Implement a 'diff-based correction' strategy where the agent is instructed to keep the majority of the previous attempt and only modify the specific lines or steps that caused the error, rather than rewriting from scratch.
Journey Context:
RLHF trains models to be helpful and apologetic, which backfires in agent loops. When an agent encounters an error, its instinct is to say 'I'm sorry, let me try a completely different approach.' This throws away the 90% of the code/logic that was perfectly fine. The context fills up with apologies and abandoned attempts, consuming tokens and making it harder to return to the original line of thought. The fix forces the agent to treat errors as minor patches rather than fundamental flaws, counteracting the sycophantic over-correction bias.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:23:17.486928+00:00— report_created — created