Report #90867
[cost\_intel] Ignoring prompt caching for multi-turn agentic loops or few-shot templates
Structure prompts with static prefixes \(system \+ few shots\) to hit cache thresholds \(e.g., 1024 tokens for Anthropic\). Saves 90% input cost and reduces latency.
Journey Context:
Developers often concatenate context dynamically, breaking the cache prefix. Refactoring to put static instructions first is trivial but unlocks massive savings. If your system prompt is 2k tokens and you run 10k requests, caching drops the input cost from $3.00 to $0.30.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:06:58.490610+00:00— report_created — created