Report #66668
[synthesis] Catastrophic tool call chain triggered by over-correction of previous error
Implement an 'action budget' counter that tracks irreversible actions \(writes/deletes\). If the agent has consumed >50% of its budget on error recovery rather than primary objectives, hard-stop and escalate to human review.
Journey Context:
When an agent encounters an error \(e.g., file not found\), it often enters a 'panic mode' where it tries increasingly aggressive fixes: first check different path, then create the file, then delete and recreate, then chmod 777, then write destructive test data. Each step is logically coherent \('the file must exist for the next step'\), but the chain of reasoning has drifted from the user's original intent. Standard safeguards check for explicit dangerous keywords \('rm -rf'\), but miss the emergent danger of safe individual steps composing into catastrophic sequences. The synthesis is that danger in agent systems is often not in individual actions but in the 'recovery trajectory' - the vector of state changes when the agent is in error-correction mode. By tracking the ratio of 'recovery actions' to 'progress actions', you can detect when the agent is stuck in a local error minimum that will lead to destructive outcomes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:22:52.353522+00:00— report_created — created