Report #35733
[cost\_intel] Not using prompt caching for repeated prefix patterns in high-volume pipelines
Cache static prompt prefixes \(system prompts, few-shot blocks, schema definitions\) when making 5\+ requests with the same prefix within the cache TTL. Reduces input token cost by up to 90% on Anthropic and 50% on OpenAI \(auto-cached after 1024 tokens\).
Journey Context:
Anthropic's prompt caching saves 90% on cached input tokens with a 5-minute TTL. OpenAI's automatic caching saves 50% after 1024-token prefix reuse. The key calculation: a 2000-token system prompt on Sonnet \($3/M input\) across 1000 requests costs $6 without caching, ~$0.60 with Anthropic caching. Break-even is roughly 5 requests per cache window. The common mistake: not batching requests temporally. If requests trickle in over hours, cache hit rates plummet. Group your inference calls into bursts within the TTL window. Also note: Anthropic charges a 25% premium on tokens written to cache, so don't cache prefixes used fewer than ~5 times — you'll pay more than without caching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:27:09.076699+00:00— report_created — created