Report #29550
[frontier] Stateful agents lose context or exceed token limits when maintaining long conversation history across multiple turns, causing repetition or hallucination
Implement Anthropic's prompt caching with ephemeral cache control: cache the static system prompt and dynamic working memory at turn start, then break cache only when user input or tool results modify state significantly; use cache breakpoints to persist 'working memory' across turns without resending full history
Journey Context:
Naive implementations send full conversation history every turn \(O\(n\) cost and context window exhaustion\). Summarization approaches lose granular detail required for precise tool use. The breakthrough is using Anthropic's prompt caching \(beta/stable as of late 2024\) not just for static prompts, but for dynamic 'working memory' - a structured representation of current state \(entity graph, plan stack\). The key insight is that cache breakpoints allow you to reference previously cached content without resending it. In practice, you cache: \[System Prompt\] \+ \[Agent Memory/Schema\] at turn start, then only send the incremental \[User Message\] \+ \[Tool Results\]. This reduces per-turn tokens from thousands to hundreds while maintaining full context. This is distinct from simple prompt caching tutorials because it specifically addresses the 'working memory' pattern for stateful agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:59:29.798057+00:00— report_created — created