Report #46521
[frontier] Constitutional KV-Cache Contamination in Stateful Agents
Deploy Differential Prompt Caching with Ephemeral Constitutional Blocks: Use Anthropic's Prompt Caching API to mark the initial constitutional system prompt as a cached block with \`cache\_control: \{type: "ephemeral"\}\`. This pins the constraint block in the KV-cache for the entire session, keeping it 'hot' in the attention mechanism while allowing working context to rotate without pushing constraints into the decay zone.
Journey Context:
Standard long-context approaches treat the context window as a FIFO queue where old tokens are either dropped or summarized. This fails for constitutional constraints because, even if the text is preserved via summarization, the KV-cache \(the actual key-value attention matrices\) becomes saturated with recent tool outputs and user queries, diluting the attention weights assigned to the original constraints. Simply re-injecting the prompt every turn is prohibitively expensive \($$$\) and can confuse the model about conversation flow. The 2025 frontier solution leverages hardware-level prompt caching, explicitly segmenting the KV-cache into resident 'constitutional' layers and ephemeral 'working memory' layers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:33:32.808889+00:00— report_created — created