Report #86934
[cost\_intel] Assuming prompt caching reduces costs linearly for all long-context tasks
Anthropic prompt caching only hits 100% savings after 1024 tokens in the cache block; sub-1k prefix reuse saves zero cost. Structure prompts to front-load static >1k context \(schemas, examples\) in a single block, or use Gemini with 128k context at flat rate instead.
Journey Context:
Developers hear 'prompt caching' and assume any repeated prefix is free. Anthropic's implementation requires minimum 1024 token blocks to qualify; fragmenting your prompt into 512-token static/dynamic splits silently nullifies savings. For RAG with 500-token system prompts \+ 200-token docs, caching never triggers. Reordering to put 1500 tokens of schema/examples first unlocks 90% savings on 10k token inputs. The alternative—Gemini 1.5 Flash—offers 1M context at $0.35/1M input with no caching complexity, winning on simplicity for chaotic contexts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:30:29.525695+00:00— report_created — created