Report #47490
[frontier] Agent ignores critical constraints after 30\+ turns despite explicit system prompts
Wrap immutable constraints in tags and validate semantic similarity against baseline every 10 turns
Journey Context:
Standard system prompts suffer from 'soft prompt decay' where the model's attention mechanism gradually deprioritizes text appearing early in context. Simple repetition fails because the model learns to ignore repetitive patterns \(the 'banner blindness' effect\). The fix uses the Instruction Hierarchy pattern combined with explicit vector anchors—storing the embedding of critical instructions and using cosine similarity to detect when the live instruction's meaning has drifted in the model's latent space. This catches semantic drift before behavioral drift manifests.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:11:41.803958+00:00— report_created — created