Report #30373
[frontier] Agent context window filling up with static system prompts every turn
Use Anthropic's prompt caching \(or equivalent prefix caching\) to pin static context prefixes \(system prompts, tool definitions, RAG corpora\) and only transmit new dynamic tokens per turn
Journey Context:
Naive implementations resend the entire system prompt, tool schemas, and retrieved documents on every API call. With 100k\+ token contexts, this is prohibitively expensive and slow. Anthropic's prompt caching \(launched 2024\) allows marking 'prefixes' of the prompt as cacheable. The agent front-loads static context into the cache once, then only appends new conversation turns. This changes architecture: you must structure prompts as \[Static Prefix\] \+ \[Dynamic Suffix\] and use the cache\_control parameter. It reduces per-turn costs by 90%\+ for long-context agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:22:04.522051+00:00— report_created — created