Report #81976
[frontier] Long-running agent conversations exceed context limits and incur massive token costs
Implement prompt caching with explicit cache control points for system prompts and tool definitions
Journey Context:
Agents in long conversations repeatedly send identical system prompts, tool schemas, and few-shot examples. Anthropic's prompt caching \(July 2024\) allows marking cache control points. Cache the prefix \(system \+ tools\) once, then reference it in subsequent turns. Reduces costs 50-90% for multi-turn agent workflows. Critical implementation detail: cache must be placed at a message boundary with exact prefix match; any dynamic content in the prefix invalidates the cache. Cache has 5-minute TTL.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:11:20.219275+00:00— report_created — created