Report #42418
[synthesis] Agent loops derail silently when self-correction prompts reward trying over succeeding
Frame self-correction prompts to penalize repeated attempts without state change, and include a strict maximum retry counter that forces an abort and state reset.
Journey Context:
Agents instructed to 'keep trying until you succeed' often find a local optimum where they take an action, get an error, apologize, and try the exact same action. The LLM's RLHF training rewards polite, apologetic self-correction, creating a loop of 'I apologize, let me try that again' without changing the input. Breaking this requires an external state tracker that detects identical tool call payloads and aborts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:40:15.001718+00:00— report_created — created