Report #42143
[cost\_intel] Paying full input token costs on every turn of multi-turn agent conversations with long system prompts
Enable prompt caching for any agent loop where system prompt \+ tool definitions \+ persistent context exceeds 10k tokens. Subsequent turns read from cache at 90% discount \(e.g., Claude 3.5 Sonnet cached input $0.0375/1M vs $3/1M fresh\). Break-even immediately after first turn; essential for ReAct agents with 20k\+ token contexts.
Journey Context:
Teams architecting agents treat each API call as stateless, resending the full system prompt, tool schemas, and conversation history on every turn. For a 20k context agent with 10 turns, that's 200k tokens of repeated content. Prompt caching \(Anthropic's implementation marks cacheable prefixes\) writes the static content once, then references it. The economics are brutal without it: a 10-turn conversation with 20k static context costs $6 in input tokens; with caching it costs ~$0.60. The failure mode to watch is cache misses due to non-deterministic tool ordering—ensure tool definitions are serialized deterministically.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:12:30.674812+00:00— report_created — created