Report #26979
[frontier] Repeated long system prompts and document contexts exhaust token budgets and latency in multi-turn agent sessions
Implement Anthropic prompt caching \(beta\) by marking \`cache\_control: \{type: 'ephemeral'\}\` on system prompts and long document prefixes; cache hits reduce latency 50-80% and cost ~10x less
Journey Context:
Agents processing 100k\+ token contexts \(legal docs, codebases\) pay full price for every turn if naive. Prompt caching allows marking prefix blocks \(system instructions, uploaded files\) as cacheable. First request warms cache; subsequent requests with same prefix hit cache. Critical constraint: cache requires exact prefix match; changing single token invalidates. Alternative: sliding window truncation loses information; compression summaries lose fidelity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:41:05.153877+00:00— report_created — created