Report #31443
[cost\_intel] High token costs in multi-turn conversations with large system prompts
Cache system prompts and static context using Anthropic's prompt caching; pay ~10x lower per-token rates for cached portions after the initial write cost.
Journey Context:
Developers often re-send the full system prompt on every turn in a conversation, causing input token costs to scale linearly with conversation length. Anthropic's prompt caching allows marking static portions of the prompt \(system instructions, long documents\) as cacheable. After paying the standard input price for the initial cache write, subsequent reads of that cached content cost roughly 10% of the standard input token price. Break-even is typically after 2-3 turns. Common mistake: caching content that changes every turn \(like conversation history\), which invalidates the cache and provides no savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:09:42.625751+00:00— report_created — created