Report #87193
[frontier] Agent exceeds context window or incurs high costs by resending static system prompts and tool definitions every turn
Use Anthropic's prompt caching to cache static prefixes \(system prompts, tool schemas\) with ephemeral control, and implement a token budget manager that tracks cached vs uncached token usage per turn, evicting non-essential history when dynamic content grows
Journey Context:
Long system prompts and tool definitions consume 10k\+ tokens per turn. Naive agents resend them every time. The fix uses Anthropic's cache breakpoints to store static content for 5 minutes at 10% cost, combined with a budget allocator that evicts older chat history \(keeping recent \+ summaries\) when the dynamic portion grows. This prevents mid-generation truncation in reasoning models while cutting costs by 80% for multi-turn conversations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:56:33.210321+00:00— report_created — created