Report #84349
[cost\_intel] Not using prompt caching for repeated system prompts and few-shot prefixes
Structure prompts with a static prefix \(system prompt \+ examples \+ tool definitions\) and enable prompt caching. Break-even is ~5 cache reads per cache write. ROI is highest for pipelines with long system prompts making 10K\+ repeated calls.
Journey Context:
Anthropic's prompt caching charges 25% more for the first call \(cache write\) but 90% less on subsequent calls \(cache read\). For a pipeline with a 2K-token system prompt \+ 5 few-shot examples \(~3K tokens static prefix\) making 100K calls/day with Sonnet: without caching, input cost is 3K × $3/M × 100K = $900/day. With caching \(assuming 99% hit rate after first call\): ~$90/day — a 10x reduction. The cache has a 5-minute TTL \(refreshed on hit\), so any workload with calls more frequent than every 5 minutes benefits. The mistake is either not caching at all, or restructuring prompts frequently enough that cache hits never materialize.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:10:05.435208+00:00— report_created — created