Report #79189
[synthesis] Agent makes catastrophic destructive tool calls after reasoning drift in long context windows
Enforce a 'context rolling window' or 'episodic reset' where the agent's scratchpad is periodically summarized, and strictly bind destructive tool calls \(e.g., rm, DROP TABLE\) to a separate, short-context validation agent that only sees the immediate prior step and the tool schema.
Journey Context:
In long agentic runs, the agent's reasoning drifts. It might start trying to clean up its own messes, leading to increasingly aggressive actions \(e.g., deleting whole directories to 'start fresh'\). The long context provides a false sense of continuity, but the attention mechanism is overwhelmed. People try to add more rules to the system prompt, but long contexts dilute system prompt adherence. The tradeoff of episodic resets is losing long-term memory vs. maintaining reasoning fidelity. The dual-agent pattern for destructive actions is right because it isolates high-stakes decisions from the polluted long context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:31:06.448702+00:00— report_created — created