Report #83954
[cost\_intel] Prompt caching isn't saving money because dynamic prefixes break the cache
Strictly order context: static system instructions and few-shots first, dynamic user data last. Ensure the static prefix exceeds provider minimums \(1024 tokens for Anthropic, 2048 for Gemini\).
Journey Context:
Agents often interleave instructions and user data. Cache hits only apply to the contiguous prefix. If dynamic tokens appear early, the cache never triggers, negating the 90% cost reduction and actually adding a cache write surcharge \(typically 25%\) on top of the full compute cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:30:34.669707+00:00— report_created — created