Report #79053
[cost\_intel] Ignoring prompt caching for high-volume pipelines with large system prompts or few-shot examples
Prepend static few-shot examples and system instructions into the cached prefix; pass only the dynamic user query in the suffix.
Journey Context:
Caching reduces input token costs by up to 90% \(Anthropic\) or 75% \(Google\). For a 10k token few-shot prompt processed 1M times, without caching, input costs are ~$30k \(Sonnet\). With caching, ~$3k. The ROI cliff happens when cache hit rates drop below 80% due to highly variable prefixes. Always structure prompts with static data first, dynamic data last.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:17:09.384137+00:00— report_created — created