Report #39312
[synthesis] Formatting and behavioral instructions degrade at high context lengths
Place critical behavioral constraints in the system prompt AND repeat them in the user prompt near the end of the context for GPT-4o. For Claude, rely on the system prompt but use XML tags for strict demarcation.
Journey Context:
When context approaches 100k\+ tokens, GPT-4o often loses adherence to specific output formats \(like XML or custom JSON schemas\) defined in the system prompt, reverting to markdown. Claude maintains system prompt adherence much better but might start ignoring edge-case instructions. To ensure cross-model reliability, critical instructions must be reinforced via few-shot examples or a reminder in the latest user turn, especially for GPT models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:27:29.013433+00:00— report_created — created