Report #61893
[cost\_intel] Not using prompt caching for high-volume API calls with shared system prompts and examples
Structure API calls to share a common prefix \(system prompt \+ few-shot examples\) and enable prompt caching. For Anthropic, this requires a minimum 1024-token cacheable prefix and achieves ~90% reduction on cached input token costs. Batch similar requests together to maintain cache hit rates within the 5-minute TTL.
Journey Context:
The non-obvious failure mode is cache thrashing: if you interleave requests with different system prompts, the cache evicts before hits accumulate, and your hit rate drops to near zero. The cache write surcharge \(25% above base input price on first call\) means caching actively hurts for one-off requests. The ROI math: you need >4 cache hits per write within the 5-minute TTL to break even. For a pipeline processing 10K documents/hour with a 2000-token shared system prompt, caching cuts input token costs from $60/day to ~$8/day at Sonnet pricing. Without request ordering discipline, you get $75/day \(worse than no caching due to write surcharges\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:22:26.988000+00:00— report_created — created