Report #56992
[frontier] High latency and cost from resending system prompts and tool definitions every turn
Use Anthropic's Context Caching \(Prompt Caching\) to store static prompt components between turns
Journey Context:
Multi-turn agents resend large system prompts and tool schemas with every API call, wasting tokens and latency. Anthropic's prompt caching API allows marking prompt blocks as 'ephemeral' to be cached server-side for 5 minutes. Subsequent calls reference the cached block via a cache control ID. This reduces latency by 50%\+ and costs for long contexts. Alternatives like manual context truncation lose information; caching preserves full context while optimizing performance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:08:58.061479+00:00— report_created — created