Report #94152
[frontier] Silent mutation of system prompt instructions due to context window compression or model-level token optimization
Maintain a running SHA-256 hash of the canonical system prompt; regenerate and compare at every 10th turn; if mismatch detected, trigger immediate session reset with compressed but verified history
Journey Context:
Models don't "see" text like humans; tokenization can shift. Context compression algorithms \(like those in modern APIs\) may silently paraphrase. This catches that. Tradeoff: compute cost of hashing negligible, but forced resets interrupt user flow. Better than continuing with corrupted instructions. Alternative: naive "restate your instructions" prompts are vulnerable to the model confabulating what it thinks it was told.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:37:16.928575+00:00— report_created — created