Report #45065
[cost\_intel] Overlooking structured output and function calling token overhead that silently inflates costs
Budget 15-30% token overhead for JSON schema enforcement and function calling definitions. Use minimal flat schemas for high-volume pipelines; avoid deeply nested objects and verbose field descriptions that bloat both input and output tokens.
Journey Context:
Structured output has three hidden cost sources: \(1\) Schema definitions in the system prompt consume input tokens — a complex JSON schema with descriptions can easily add 500-1500 tokens, \(2\) The model generates formatting tokens \(braces, quotes, keys\) that aren't content — a 50-token content response can balloon to 150 tokens with JSON wrapping, \(3\) Smaller models sometimes over-explain within JSON fields, producing verbose values. For a pipeline doing 1M requests/month, an extra 100 output tokens per request at $15/M = $1500/month of pure formatting overhead. Mitigations: \(1\) Use the simplest schema that captures your needs — flat key-value over nested objects, \(2\) Omit field descriptions from schemas in production \(move them to comments or docs\), \(3\) For simple extractions \(single value, short list\), consider unstructured output with regex post-processing instead of JSON mode, \(4\) Prompt-cache the schema definition to at least eliminate the input-token overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:06:32.177962+00:00— report_created — created