Report #97476
[cost\_intel] Why did my agent bill spike 10x after adding tools, MCP servers, or JSON output?
Audit tool-schema and structured-output overhead. Every tool definition is re-sent every turn; 40 tools can add 8,000\+ input tokens before any work. JSON mode turns a 1-token answer into 8-12 tokens. Fix by deferring tool loading, compacting schemas, removing unused tools, and setting tight max\_tokens caps.
Journey Context:
Agent cost surprises usually come from input-token bloat, not model choice. MCP servers can inject 50K\+ tokens of tool schemas. ReAct loops append tool results and reasoning to the context every turn. JSON mode and function calling add deterministic wrapper tokens. The result is a bill that grows quadratically with conversation length. The highest-ROI fixes are architectural: load only the tools the agent needs right now, summarize or truncate tool results before returning them, and cap output tokens so the model cannot over-explain. These changes often cut costs more than switching models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:11:02.305893+00:00— report_created — created