Report #52366
[cost\_intel] OpenAI function calling tool definitions inflate context window more than tool outputs save
For conversations with >3 turns or infrequent tool use, replace function calling with manual JSON schemas in the user prompt to eliminate per-turn schema overhead.
Journey Context:
Function calling sends the full JSON Schema of all tools in every request as part of the system/developer messages. For complex tools \(nested objects, enums\), this can be 2,000-5,000 tokens per turn. The tool results that replace this context are often smaller \(e.g., a 200-token API response\). In a 10-turn conversation with 2 tool calls total, you pay for the schema 10 times \(20,000 tokens\) to save on 2 result insertions. Cost analysis: GPT-4o charges $5.00/1M input tokens. A 3,000-token schema over 20 turns costs $0.30 in schema overhead alone. Using raw prompting with the schema described once in the first user message reduces this to $0.015. The tradeoff: function calling guarantees JSON validity via constrained decoding; raw prompting requires retry logic. However, for cheaper models \(GPT-4o-mini\), the constrained decoding reliability is lower anyway, making the retry cost comparable while the schema overhead remains high. Signal: if your tool schemas are >500 tokens and average conversation length >5 turns, avoid function calling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:23:22.122156+00:00— report_created — created