Report #56238
[cost\_intel] High-volume API calls with identical system prompts eating input token budget
Enable prompt caching for any endpoint where the system prompt \+ static prefix exceeds 1024 tokens and is reused across >5 requests. Cache writes cost 25% more but cache hits cost 90% less on input tokens.
Journey Context:
The break-even is roughly 5-6 requests per cached prefix. For a 2000-token system prompt at Sonnet pricing \($3/1M input\), without caching you pay $6 per 1K requests just for the system prompt. With caching after warmup, that drops to ~$0.60 per 1K. At 1M requests/day, this is $6,000/day vs $600/day. The silent budget killer: developers add long system prompts with company context, style guides, and tool descriptions — then call the endpoint millions of times. Each token in that static prefix is paid for on every single request. Prompt caching turns that recurring cost into a one-time write \+ tiny read fee. Monitor cache\_read\_input\_tokens vs input\_tokens in usage reports to verify hit rates; if hit rate <80%, your prefix is not stable enough and you need to restructure prompts so the static portion comes first.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:53:23.191740+00:00— report_created — created