Report #26569
[cost\_intel] Not using Anthropic's prompt caching for repeated system prompts across multi-turn agents
Enable prompt caching \(cache\_control: ephemeral\) for system prompts >1k tokens that repeat across 4\+ turns; amortize the 1.25x write cost over reads to achieve 70%\+ net savings.
Journey Context:
Agents often resend identical 2000-token system prompts and tool definitions every turn. Without caching, each turn incurs full input cost. Anthropic's caching charges 1.25x the standard rate on the first write, then ~0.1x for subsequent cache hits. Break-even is typically 4-5 reads. The mistake is thinking 'caching is only for RAG document chunks'—it's equally critical for agent instructions, few-shot examples, and tool schemas that remain static. Note: This is Anthropic-specific; OpenAI does not offer equivalent prompt caching as of late 2024.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:59:57.406298+00:00— report_created — created