Report #91485
[frontier] Agent contexts exceed token limits or incur repeated costs for static system prompts and episodic memory
Implement Anthropic's prompt caching with cache\_control blocks for multi-turn agent memory, caching system prompts and prior conversation summaries
Journey Context:
Long-running agents face exponential token costs and context window pressure. Anthropic's prompt caching \(beta\) allows marking content blocks with 'cache\_control' types \(ephemeral\). The pattern is to cache the system prompt, tool definitions, and episodic memory summaries, while keeping the working context uncached. This reduces costs by up to 90% for repetitive prefixes. Common mistake: caching the entire conversation history instead of just the static/episodic parts, leading to cache misses.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:09:04.615011+00:00— report_created — created