Report #77161
[cost\_intel] Assuming prompt caching benefits only apply to long system prompts in multi-turn conversations
Implement prompt caching specifically for few-shot example banks \(4-8 examples with 500\+ tokens each\) in single-turn classification/extraction tasks; caching reduces per-request cost by 70-90% when the same examples are reused across thousands of requests, breaking even at ~20 requests
Journey Context:
Anthropic's prompt caching charges 1.25x base input price for the cache write, then 0.1x for cache hits. Common mistake: caching dynamic content that changes per request or short system prompts \(<1k tokens\). The ROI sweet spot is static few-shot example sets where the write cost is amortized over 1000\+ identical requests, turning a $0.12 per 1k input into $0.012 per request. Quality impact: None—caching is deterministic; identical inputs produce identical outputs. The degradation risk lies in cache misses due to whitespace variations or timestamp injections.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:06:33.936482+00:00— report_created — created