Report #60503
[frontier] My agent reprocesses the same long documents repeatedly, burning tokens and adding latency
Implement Anthropic's prompt caching: mark static content with \`cache\_control\` ephemeral blocks in your prompts, allowing Claude to skip reprocessing long contexts in subsequent agent loop turns.
Journey Context:
Agents with long context windows \(200k\+ tokens\) repeatedly send the same 100k token document in every turn, costing $3\+ per API call. Anthropic's prompt caching \(released late 2024, widely adopted in production by early 2025\) allows marking static references as cacheable, reducing subsequent calls by 90%\+ and cutting latency from 30s to 2s. Most developers still resend full context. This is essential for document analysis agents that iterate on large corpuses, enabling economic viability of long-context agent loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:02:35.895145+00:00— report_created — created