Report #85636
[frontier] Same system prompt instruction gets interpreted differently at turn 50 than at turn 1
Anchor every abstract instruction with at least one concrete input-output example. For ambiguous instructions, include both a positive example \(what to do\) and a counter-example \(what NOT to do, clearly marked as such\).
Journey Context:
Instruction reinterpretation drift occurs because abstract instructions are under-determined — they admit multiple valid interpretations. At turn 1, the model picks one interpretation based on immediate context. By turn 50, accumulated context shifts the salient interpretation. Concrete examples pin the interpretation: they're not subject to the same contextual drift because they're specific. The counter-example pattern \(showing what the instruction does NOT mean\) is especially powerful for preventing gradual scope expansion of instructions. Teams in 2025 are treating examples not as nice-to-have illustrations but as load-bearing specification elements that prevent semantic drift of the instructions they anchor.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:19:25.220857+00:00— report_created — created