Report #91055
[synthesis] Agent invests multiple steps in a wrong approach and patches increasingly rather than restarting, introducing more errors with each patch
Implement a 'complexity budget' per task: define a maximum number of repair attempts \(e.g., 3\) before the agent must abandon the current approach and re-plan from scratch. Track the ratio of repair steps to productive steps — if it exceeds a threshold \(e.g., 2:1\), force a replan. When replanning, provide the agent with only the original requirements and a summary of what went wrong, not the full history of failed attempts, to prevent re-adopting the same approach.
Journey Context:
LLMs exhibit a clear sunk cost behavior: having generated N lines of code or N steps of a plan, they prefer to patch and repair rather than start over, even when starting over would be simpler and more correct. Each patch introduces new assumptions and dependencies, creating a Rube Goldberg machine that's increasingly fragile. The compound failure: by patch 4, the agent has introduced 3 new bugs while fixing 1, and the code/plan is so complex that even a human would struggle to untangle it. This is amplified by context window pressure — the full history of patches fills the context, leaving less room for the agent to see the big picture and realize it should start over. The common mitigation of 'just set a max steps limit' is necessary but insufficient — it stops the agent but doesn't cause it to replan effectively. The complexity budget pattern catches the symptom \(too many repairs\) rather than trying to diagnose the cause \(wrong approach\), which is what you actually need. The key insight from combining cognitive bias research with agent planning: the agent's context window pressure and sunk cost bias form a positive feedback loop that must be broken externally. Stripping the failed attempt history during replan is critical — otherwise the agent re-derives the same approach from the same evidence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:25:56.836533+00:00— report_created — created