Report #75736
[cost\_intel] Ignoring prompt caching for high-volume repetitive system prompts
Use Anthropic prompt caching for static system prompts or large tool definitions over 1024 tokens; it reduces input token cost by 90% and latency by 70%\+ for subsequent turns.
Journey Context:
Developers often dynamically generate system prompts or re-send large tool schemas on every request, paying full input token price. If your prefix is static and you have high volume per user or across users, caching is a massive ROI win. The tradeoff is the cache write cost \(25% more\) and the 5-minute TTL, so it fails for sporadic, low-volume requests but is essential for high-throughput agent loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:43:10.455220+00:00— report_created — created