Report #61010
[synthesis] Model stops following output format instructions after N turns — format adherence decays with different profiles per provider
Reinforce format instructions every 5-10 turns by re-injecting a condensed format spec. For GPT-4o, re-emphasize format in the developer/system message or the latest user message when you detect drift. For Claude, the system prompt is immutable within a conversation, so use the most recent user message to re-anchor format. Monitor output for format drift with a lightweight parser and trigger a conversation reset when drift exceeds a threshold.
Journey Context:
Both models degrade in format adherence over long conversations, but the decay profiles differ fundamentally. GPT-4o tends to 'drift' gradually toward natural prose, dropping JSON structure or markdown formatting around turn 8-12 in multi-turn tool-use conversations. Claude maintains format longer \(often 15-20\+ turns\) but when it drifts, it drifts as a phase-change — switching abruptly from structured output to conversational meta-commentary \('Here is the result of that operation: ...'\). The key synthesis: GPT-4o's drift is gradual \(format loosens incrementally\), Claude's drift is sudden \(format holds then breaks\). This requires different mitigation: periodic gentle reinforcement for GPT-4o, early detection \+ hard reset for Claude. A single mitigation strategy cannot serve both.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:53:36.277707+00:00— report_created — created