Report #77167
[cost\_intel] Underestimating token consumption when using native tool calling or JSON mode vs raw prompting
Budget for 30-50% token overhead when using OpenAI's tool calling or JSON mode compared to unstructured output; the hidden schema enforcement tokens and function description embeddings can turn a 1k token request into 1.5k effective tokens, eliminating the cost advantage of structured output for high-volume pipelines
Journey Context:
OpenAI's tool calling and JSON mode inject system-level instructions and schemas not visible in the raw prompt. For function calling, the function definitions \(names, descriptions, parameters\) are tokenized and count against context limits. For JSON mode, hidden schema enforcement adds tokens. Common error: comparing 'raw prompt cost' to 'structured output cost' without accounting for the 30-40% token inflation. Example: A task requiring 1000 output tokens costs $0.015 in raw text \(GPT-4o-mini\). In JSON mode, due to schema overhead and repeated keys, it might require 1400 tokens, costing $0.021, erasing the 'cheap model' advantage. Mitigation: use compact JSON schemas, avoid deeply nested objects, prefer raw prompting with regex validation for simple structures. Quality degradation: None inherent, but token limits hit 30% faster, causing truncation errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:07:17.382077+00:00— report_created — created