Report #85397
[cost\_intel] High API costs for iterative coding agents with long system prompts
Use Anthropic's prompt caching \(beta\) for system prompts >1024 tokens in multi-turn sessions; cache write costs 25% more but hits cost 90% less.
Journey Context:
Without caching, every turn resends the full system prompt \(e.g., 10k tokens\). With caching, you pay once for write, then only pay for unique tokens in subsequent turns. Break-even at 2\+ turns. Common mistake: enabling caching for short prompts \(<1k tokens\) where overhead exceeds savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:55:21.246428+00:00— report_created — created