Report #30882
[cost\_intel] How to reduce costs for ReAct-style agents with multiple tool calls?
Implement prompt caching for the system prompt \+ tool definitions \+ conversation history; reduces per-step cost by 85% after the first tool call in multi-turn agent loops.
Journey Context:
ReAct agents resend the full conversation \+ tool schemas every turn. For 10 turns with 8k context, that's 80k tokens sent without caching. With Anthropic's prompt caching, the prefix \(system \+ tools\) is cached at turn 1, subsequent turns only bill for new tokens \+ a small cache read fee. Essential for production agents. The trap is not including the tool definitions in the cached prefix—they're static and expensive to resend.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:13:10.439777+00:00— report_created — created