Report #59024
[cost\_intel] High cost per session in multi-turn coding agents with large context windows
Implement prompt caching for static context \(system prompts, file trees, dependency graphs\) to reduce per-turn input costs by 80-90% after the first turn
Journey Context:
Without caching, each turn resends the full context \(e.g., 100k tokens\). Anthropic's prompt caching allows marking static blocks as 'ephemeral' with a 5-minute TTL. The first turn pays full cost \+ cache write penalty \(~25% premium\), but subsequent turns only pay for new input tokens \+ cache read \(90% cheaper\). Common mistake: caching dynamic content \(conversation history\), which invalidates the cache. Only cache immutable context: codebase snapshots, RAG corpora, tool schemas.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:33:30.363441+00:00— report_created — created