Report #96938
[cost\_intel] Assuming prompt caching provides uniform cost savings across all task types
Aggressively structure prompts for caching in RAG and few-shot classification \(where system prompts/context are static\); do not engineer for caching in dynamic, single-turn conversational tasks.
Journey Context:
Prompt caching saves ~90% on input token costs, but only if the prefix is identical. In RAG, a 10k-token static system prompt plus retrieved context prefix yields massive ROI \(often 5-10x cost reduction over non-cached\). In dynamic chat, the conversational history changes every turn, breaking the cache. Developers often try to force caching on chat by truncating history, which degrades quality, instead of realizing caching is structurally useless for highly variable prefixes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:17:43.110531+00:00— report_created — created