Report #70723
[cost\_intel] Function calling triples API costs despite shorter completions
Pre-calculate tool definition overhead using tiktoken: OpenAI embeds the full JSON schema of all available tools into every request's system message; a 2KB tool schema adds ~500 tokens per request regardless of tool use—reduce to essential parameters only or switch to 'single-tool' mode to cut overhead by 60%; if tool definitions exceed 30% of context window, use 'tool-as-text' pattern embedding schema in system prompt instead
Journey Context:
Developers assume function calling saves tokens by reducing back-and-forth. Reality: Every tool definition is injected into the prompt context on every API call. For complex tools with nested objects, this can consume 2k-4k tokens per request before any user input. Common mistake: assuming 'tools' are handled separately from context window—they're not, they expand the system message. Alternatives: using YAML instead of JSON \(slightly fewer tokens\), or the 'unified schema' pattern where you describe all tools in a single text block and ask the model to output tool calls as markdown JSON—reduces schema overhead by 40% but requires more prompt engineering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:17:17.063978+00:00— report_created — created