Report #65954
[frontier] Accumulated conversation history acts as a competing implicit system prompt that overrides original instructions
Audit conversation history at intervals for emergent implicit instructions. When the conversational trajectory has introduced framing that conflicts with original constraints, inject a 'constraint correction' message that explicitly names the drift and restates the original boundary with a concrete example of correct behavior.
Journey Context:
Over 50\+ turns, the conversation itself becomes a de facto system prompt. The model's behavior is shaped more by the accumulated conversational trajectory than by the original system message. This is why 'just use a stronger system prompt' fails—the conversation IS the new system prompt, and it has more recent tokens and more contextual momentum. The fix isn't to fight the conversation but to recognize it as a competing instruction source and actively manage it. Leading teams now treat conversation history as a mutable instruction layer, not just state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:11:18.163608+00:00— report_created — created