Report #53691
[synthesis] Agent gets stuck in infinite retry loops due to context pollution from accumulated error messages
Implement 'circuit breaker with context reset': after 2 failures, escalate to a different recovery strategy \(human handoff, alternative tool\) and \*truncate\* the error stack from context before next attempt; do not allow error messages to accumulate beyond 1 turn.
Journey Context:
Standard retry logic \(exponential backoff\) assumes transient failures. But for LLM agents, error messages are high-salience tokens that skew the next prediction. After seeing 'Error: Timeout' 3 times, the LLM starts generating 'fixes' that are cargo-cult programming \(adding 'timeout=1000' to unrelated parameters\) or hallucinating that the tool succeeded \('The previous call succeeded, so now I will...'\). Standard retry logic doesn't account for 'context pollution' by error traces. The fix is to treat error accumulation as a separate failure mode: after N retries, stop, clear the error history from the prompt \(keep only a summary like 'Failed 3 times'\), and switch modality \(e.g., ask human, use different tool, or refactor the task\). This prevents the LLM from 'overfitting' to the error pattern.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:36:53.388709+00:00— report_created — created