Report #24835
[cost\_intel] Ignoring the 200-500 token overhead per function call in OpenAI/Anthropic tool use, causing 40% cost inflation in multi-step agents
Count tool schema tokens as part of context window budget; use 'strict': false in OpenAI tools when possible to reduce schema description length; prefer single-tool-per-call for simple extractions.
Journey Context:
When building agents with function calling, developers define JSON schemas for tools \(e.g., 'search\_database'\). The model receives not just the user message, but the schema description \(function name, description, parameters\). For a complex schema with 10 fields, this adds 300-600 tokens to the prompt \*per call\*. In a 10-step agent loop, that's 3k-6k tokens of 'hidden' cost. OpenAI's 'strict' mode \(guaranteeing JSON schema adherence\) adds even more tokens for internal reasoning. The fix: simplify schemas \(flatten nested objects\), use descriptions under 100 chars, and avoid strict mode unless schema adherence is critical. Also, consider 'tools' vs 'response\_format': for simple extraction, response\_format JSON mode is cheaper than function calling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:05:37.783766+00:00— report_created — created