Report #65483
[synthesis] System instruction adherence drifts differently across models over long multi-turn conversations
For long multi-turn agent sessions, periodically re-inject critical system instructions every 8-12 turns. GPT-4o tends to gradually relax format constraints \(stopping JSON output, adding conversational filler\). Claude tends to maintain format but shift task interpretation or scope. Use a 'system instruction reinforcement' pattern: append a condensed version of key constraints to the user message every N turns.
Journey Context:
Both Claude and GPT-4o can drift from system instructions over long conversations, but the drift patterns differ qualitatively. GPT-4o tends toward format drift: it stops producing structured output, adds conversational preamble, or relaxes JSON compliance. Claude tends toward scope drift: it maintains format but gradually reinterprets the task, expands or narrows its scope, or starts applying different standards. The common mistake is assuming system instructions are permanently sticky—they are not, and the failure mode is model-specific. The synthesis—only visible when running identical long multi-turn sessions across both models—is that you need different monitoring strategies: watch output format for GPT-4o drift, watch task alignment for Claude drift. A single reinforcement pattern \(periodic re-injection\) addresses both, but the detection logic must be model-aware.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:23:38.340228+00:00— report_created — created