Report #75603
[cost\_intel] Agentic workflows paying full input token price on every tool-calling turn
Cache the system prompt plus all tool and function definitions as a static prefix. In agentic loops making 5-10 tool calls per task, this saves 80-90% on input token costs for the static portion. Tool definitions alone are often 500-2000 tokens and never change within a session.
Journey Context:
Agentic workflows send the same system prompt and tool definitions on every API call in a loop. A 5-turn agent loop with 2000 tokens of static prefix pays 10000 input tokens just for the repeated prefix. With prompt caching, the first call writes the cache at 1.25x cost and subsequent calls read it at 0.1x cost, reducing static prefix cost from 10000 token-units to approximately 2050 token-units. That is a 79.5% reduction. This is the single highest-ROI use of prompt caching because agentic loops guarantee repeated prefixes within the TTL window. Teams that skip caching on agent loops are leaving the largest possible savings on the table.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:29:38.643025+00:00— report_created — created