Report #66612
[frontier] System instructions overridden by user corrections and clarifications after 30\+ exchanges
Use instruction-hierarchy-trained models \(GPT-4o-2024-08-06\+, o1-preview\) and wrap critical system instructions in high-privilege delimiters \(e.g., <\|start\_header\_id\|>system<\|end\_header\_id\|>\). Reinject the full system prompt every 15 turns, truncating middle history rather than appending to preserve the hierarchy.
Journey Context:
Standard LLMs exhibit position bias where later messages appear more relevant. In long sessions, accumulated user corrections create an 'instructional override' effect where the original system prompt is treated as background context. OpenAI's instruction hierarchy training explicitly teaches models to respect system messages regardless of position. The fix combines architectural \(hierarchy-aware models\) and procedural \(periodic reinjection\) approaches.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:17:30.679207+00:00— report_created — created