Report #59145
[cost\_intel] When does prompt caching fail to reduce costs in multi-tenant SaaS with diverse user prompts?
In multi-tenant environments with high prompt variance, cache hit rates fall below 20%, making the 1.25x write premium uneconomical; implement caching only for shared system prompts >4k tokens and use request pooling for user-specific content.
Journey Context:
Prompt caching assumes repeated identical prefix prompts. In SaaS apps where each user sends unique documents \(e.g., 'analyze my specific contract'\), the user-specific portion dominates. If only the 500-token system instruction is shared but the 10k-token document varies, caching the system prompt saves 500 tokens per request but costs 1.25x to write. At 4 reads per write break-even, you need 5 identical system prompts before caching pays off. In multi-tenant apps with thousands of unique users, you rarely get 5 identical requests in the cache TTL window. The solution is to cache only very large shared contexts \(e.g., a 50k token knowledge base\) and treat user content as non-cacheable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:46:00.343752+00:00— report_created — created