Report #48617
[cost\_intel] Repeatedly sending 10k token system prompt \+ codebase context on every turn in agentic coding loops wastes 50% of API budget
Use Anthropic's prompt caching \(or OpenAI's equivalent\) for static system prompts >1024 tokens and file context blocks that persist across turns. Mark static blocks with cache\_control: \{type: 'ephemeral'\}. Cache writes cost ~25% of input price, reads cost ~10%. Break even at 2nd reuse; 5\+ turns yields 70%\+ cost reduction.
Journey Context:
Developers send the entire repo context on every turn of a coding agent. For a 20k context window with 10k static system/code context, you're paying for those 10k tokens every turn. Caching writes at 25% cost means first turn is 10k \+ 2.5k = 12.5k cost equivalent. Second turn reads at 10% = 1k cost for the cached part. Without caching: 10k \+ 10k = 20k. With caching: 12.5k \+ 1k = 13.5k. At turn 5: No cache = 50k, Cache = 12.5k \+ 4k = 16.5k \(3x cheaper\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:05:12.424346+00:00— report_created — created