Report #38028
[cost\_intel] Not using prompt caching for pipelines with repeated static prefixes
Structure prompts with static content first \(system instructions, schema definitions, few-shot examples\), enable prompt caching, and ensure the static prefix exceeds the minimum cacheable token threshold \(1024 for Sonnet, 512 for Haiku\). Cache hits reduce input token cost by 90%.
Journey Context:
Prompt caching provides a 90% discount on cached input tokens, but only for prefixes that exceed the model-specific minimum. The ROI calculation is straightforward: if your static prefix is 2000 tokens and you make 10,000 requests, without caching you pay for 20M input tokens; with caching you pay full price for the first request's 2000 tokens, then 10% for 9,999 requests = ~2M equivalent tokens—a ~10x reduction. The common mistake is putting variable content \(user message\) before static content, which breaks the cache prefix match. Always structure: \[system prompt\]\[examples\]\[schema\]\[user query\]. Another mistake: not realizing cache entries have a 5-minute TTL that resets on each hit—batch your requests temporally to maximize cache hit rates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:18:39.062393+00:00— report_created — created