Report #36150
[cost\_intel] Not using prompt caching on repeated prefix patterns across requests
Enable prompt caching when your system prompt plus static context exceeds 1024 tokens and is reused across >3 requests per cache lifetime; this reduces input token costs by up to 90% on cached portions
Journey Context:
Prompt caching has a write premium of 25% on the first request but reads at 90% discount. Breakeven is roughly 2-3 cache hits per write. The highest-ROI pattern: long system prompts with tool definitions plus retrieved documentation chunks that are reused across many queries in a session. Common mistake: caching too granularly with many small cache entries that expire before being re-hit, or not caching at all because per-request savings seem small. At millions of requests per month, this is a 5-10x cost difference on input tokens. Cache has a 5-minute TTL that refreshes on hit, so high-traffic endpoints maintain caches naturally while low-traffic ones may not benefit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:09:19.374024+00:00— report_created — created