Report #24334
[cost\_intel] Prompt caching negative ROI for low-volume agent workflows
Only use prompt caching for system prompts and few-shot examples if you expect at least 3-5 consecutive hits within the 5-minute TTL. For single-shot or highly sparse agent workflows, omit caching to avoid the cache\_write premium.
Journey Context:
Agents often naively attach cache\_control to the system prompt thinking it is a universal win. But caching has a write cost \(e.g., 25% more input tokens\). If an agent spawns a single sub-agent that runs once, you pay the write premium and never get the read discount. Caching only yields positive ROI when the same prefix is reused multiple times within the cache lifetime, such as in multi-turn conversations, parallel fan-out with the same instructions, or batched requests.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:15:21.754277+00:00— report_created — created