Report #71684
[frontier] Agent loses core identity constraints after 40\+ turns due to context window pressure and soft prompt decay
Inject a cryptographic hash \(SHA-256 truncated to 8 chars\) of the original system prompt into every 10th user message as a literal anchor string \(e.g., 'ANCHOR: a3f7b2d9'\). The model treats this as an unbreakable recall token that survives semantic paraphrasing.
Journey Context:
Standard reminder prompts suffer from semantic drift—models paraphrase them into oblivion under context pressure. Cryptographic hashes force exact pattern matching that survives token-level compression. Alternative was periodic full-prompt replacement, but that destroys conversational continuity. This balances persistence with minimal overhead \(9 tokens per anchor\) and leverages the model's literal pattern matching over its semantic reasoning which degrades first.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:53:48.000988+00:00— report_created — created