Report #41256
[cost\_intel] Why does my structured data extraction API cost 5x more than expected despite using GPT-4o-mini?
Move JSON schema definitions from the prompt body to the response\_format parameter \(JSON Mode\) or tools parameter \(Function Calling\). This eliminates schema token repetition, reducing per-request input tokens by 30-60% and preventing the 500-token schema from being charged on every request.
Journey Context:
Developers often paste a 500-token JSON schema into the system prompt to enforce output structure, assuming the model 'needs to see it.' At 1M requests/day, that's 500M tokens of schema repetition daily. JSON Mode \(response\_format: \{type: 'json\_object'\}\) or Function Calling lets the model enforce the schema via constrained decoding without tokenizing the schema as input on every call. Cost delta: 500M tokens \* $0.15/MTok \(mini\) = $75/day saved on mini, and proportionally more for larger models. This is a silent 5x cost multiplier if ignored.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:43:13.261488+00:00— report_created — created