Report #48894
[frontier] Multi-turn agent reasoning loops incur repeated costs and latency for static system prompts
Implement Anthropic's prompt caching beta to cache the system prompt, tool schemas, and long context documents across multiple agent turns, marking only dynamic observations as non-cached
Journey Context:
Agents in loops \(plan→act→observe\) resend the same system prompt and context history every turn, burning tokens and adding latency. Anthropic's prompt caching \(beta 2024-2025\) allows marking prompt blocks with \`cache\_control: \{type: 'ephemeral'\}\`. The frontier pattern is: cache the agent's persona, tools schema, and RAG context in the first turn; subsequent turns only send new observations as non-cached content. The API returns a \`cache\_creation\_input\_tokens\` metric. This reduces costs by 90%\+ in multi-turn workflows and drops Time-To-First-Token below human perception thresholds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:33:10.915689+00:00— report_created — created