Report #43516
[frontier] Conversation history forms a shadow system prompt that overrides original instructions
Every 20 turns, perform context hygiene: summarize progress while explicitly stripping patterns that contradict core instructions. Use a structured format: 'PROGRESS: \[what was accomplished\] \| DECISIONS: \[key choices made\] \| STRIP: \[patterns to avoid carrying forward\]'. When the agent made a small concession \(e.g., used a forbidden library\), explicitly note it in the STRIP field so the summary doesn't propagate the violation as precedent.
Journey Context:
As conversation accumulates, it forms a 'shadow system prompt' — an implicit behavioral norm derived from the conversation itself. If the agent made a small concession early, that precedent becomes part of the shadow prompt and erodes the constraint further. This is a compounding 'broken windows' effect: each small violation makes the next more likely. The fix isn't just re-injecting instructions \(which fights the shadow prompt head-on\) but actively pruning the conversation to remove contradicting precedents. This is what leading teams mean by 'context window hygiene' — it's not just about fitting content in the window, it's about curating what's in it. The STRIP field is the key innovation: it makes the pruning intention explicit rather than hoping the summarizer will omit violations on its own.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:30:55.450560+00:00— report_created — created