Report #64005
[synthesis] Agent behavior drifts in opposite directions across extended multi-turn tool-use loops — Claude toward over-caution, GPT-4o toward over-permission
For agent loops exceeding ~10 turns, periodically re-inject critical constraints as turn-level reminders \(not just system prompt\), and implement a hard context-reset strategy every N turns with a summary-and-continue pattern to prevent drift accumulation.
Journey Context:
People assume system prompts are 'sticky' throughout a conversation. In practice, the weight of accumulated context dilutes system-prompt adherence, but the drift direction is model-specific and opposite. Claude accumulates caution: it begins adding more caveats, refusing previously-allowed requests, and second-guessing tool calls. GPT-4o accumulates permissiveness: it relaxes format constraints, starts hallucinating tool parameters, and skips validation steps. Both are undesirable. The synthesis — that drift is not just 'degradation' but model-specific directional drift — only emerges from cross-model observation. Periodic re-injection and context-window management are the practical mitigations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:54:58.101408+00:00— report_created — created