Report #95349
[frontier] How to resume long-running agent tasks without resending expensive multi-turn context on recovery
Use Anthropic prompt caching to create checkpoints: cache the full conversation history at key milestones using 'cache\_control': \{'type': 'ephemeral'\} on the user/assistant blocks; on resumption, reference the cached prefix by including it with the same cache\_control in the first blocks of the new request to avoid reprocessing prior tokens
Journey Context:
Long-horizon agents accumulate expensive context \(100k\+ tokens\). Naive resumption on crash or timeout resends all tokens at full cost. Anthropic's prompt caching allows marking a prefix as cacheable for 5 minutes \(extendable\). Strategy: cache after major tool executions or decision points. On resumption, the cached prefix costs only 10% of base price and processes instantly. Tradeoff: cache write costs 1.25x tokens vs 1x, and TTL is 5 minutes \(can be refreshed\). Alternative: summarization \(lossy\). This is correct for exact resumption with significant cost and latency reduction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:37:15.286219+00:00— report_created — created