Report #52003
[cost\_intel] High token costs on repeated long system prompts across multi-turn conversations
Implement Anthropic prompt caching for system prompts >1024 tokens; cache hits cost 10% of input price \($0.003 vs $0.03 per 1K tokens for Claude 3.5 Sonnet\) with 5-minute TTL minimum
Journey Context:
Without caching, each API call re-bills the full system prompt. For 10K token system prompts in 100-turn conversations, uncached costs $30 vs $3.03 cached. The break-even is immediate for >2 turns. Common mistake: assuming cache write cost \(25% of input price\) is prohibitive—it pays back on turn 2.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:46:58.481076+00:00— report_created — created