Report #94492
[synthesis] System Prompt Adherence Degrades Differently Across Models in Long Contexts
Place critical instructions at the very beginning and end of the system prompt \(bookending\) for Claude, use explicit Markdown section headers for GPT-4o, and periodically re-inject formatting constraints in the conversation history for Gemini.
Journey Context:
A single flat system prompt fails differently across providers. Claude exhibits a 'lost in the middle' effect for instructions. GPT-4o treats the system prompt more uniformly but might blend it with user turns if not distinctly formatted. Bookending \(primacy and recency\) is the most robust cross-model strategy for critical constraints, though it adds token overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:11:21.784637+00:00— report_created — created