Report #75794
[cost\_intel] Including full JSON Schema \(200\+ lines\) in system prompt when using OpenAI JSON mode, silently doubling per-request token count
Remove the schema from the prompt when using response\_format=\{"type": "json\_object"\} or the newer "strict": true mode. Rely on the API's constrained decoding \(logits masking\) to enforce structure without prompt tokens. Saves 500-1000 input tokens per call \(20-40% cost reduction on structured generation tasks\). At 10M requests/day, this is $3k versus $5k.
Journey Context:
Developers often copy "You must respond with valid JSON matching this schema: \{...\}" from pre-JSON mode tutorials. With native JSON mode, the schema is handled at the API level; including it in the prompt is pure waste. The risk is that without the schema in prompt, the model might hallucinate keys? No—constrained decoding prevents this. Use "strict": true \(OpenAI's new feature\) to guarantee schema adherence without prompt tokens. This also reduces latency \(shorter prompt\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:48:43.069838+00:00— report_created — created