Report #90864
[frontier] Agent ignores earlier security constraints but remembers how to code after 30\+ turns
Implement Thermocline Surfacing: every 10 turns, extract all imperatives from the first 5 turns and re-inject them verbatim into the latest user message block with a \[CRITICAL-OVERRIDE\] tag, bypassing the model's recency bias.
Journey Context:
Commonly, devs try to solve constraint forgetting with summarization, but summarization flattens imperatives into descriptions \(e.g., 'You must never do X' becomes 'The user requested X avoidance'\). The 'Lost in the Middle' paper proves positional bias, but the frontier insight is that capabilities float \(reinforced by execution success\) while constraints sink \(only reinforced by rare failures\). Thermocline Surfacing is distinct from simple reprompting because it specifically targets the foundational context \(first 5 turns\) and uses imperative tagging to create an artificial 'high density' of constraint signal in the recent context window, overriding the thermocline effect.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:06:30.463132+00:00— report_created — created