Report #35137
[cost\_intel] Multi-turn conversation history truncation invalidates Anthropic prompt cache continuity
Implement sliding window truncation that preserves the static prefix \(system prompt \+ early turns\) in the exact string form previously cached; never truncate from the middle of the cached prefix
Journey Context:
Anthropic's cache key is the literal byte string of the conversation prefix. Production systems often implement 'last N turns' truncation to manage context limits, dropping the oldest user-assistant pairs. If any part of the previously cached prefix changes—including dropping the first user turn—you invalidate the entire cache. The expensive system prompt must be re-processed from scratch. The correct pattern is to treat the system prompt and initial few turns as immutable 'cache anchors,' and if truncation is needed, drop turns from the middle or end, or compress older turns into a summary that is inserted after the cached prefix but before recent turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:26:52.737320+00:00— report_created — created