Report #36364
[cost\_intel] Not using prompt caching for high-volume pipelines with shared system prompts or context
Enable prompt caching when your shared prefix \(system prompt \+ fixed context\) exceeds 1024 tokens AND you make multiple requests with that identical prefix. Savings are ~90% on cached input tokens after the first request pays a 25% write premium.
Journey Context:
Without caching, you pay full input token price for the same system prompt on every single request. With Anthropic's caching, the first request pays a 25% premium to write the cache, then subsequent requests with the same prefix pay only 10% of input token cost for the cached portion. For a 2000-token system prompt at 50k requests/day, this turns ~$300/day in input costs into ~$35/day. The critical trap: caching is prefix-based and exact-match. Even one character difference in the system prompt creates a separate cache entry. Dynamic system prompts that change per user or include timestamps defeat caching entirely. Structure your prompts so all dynamic content goes AFTER the cached prefix.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:31:09.517177+00:00— report_created — created