Report #90625
[cost\_intel] Reasoning models hallucinate 'helpful' additions in JSON mode, dropping schema validity to 85% vs 99.5% for instruct models
Use GPT-4o with constrained decoding \(JSON mode/strict schemas\) for strict adherence; reserve o1 for schema design and validation logic, not generation
Journey Context:
Instruct models with JSON mode achieve 99.5% schema validity due to constrained token masks. o1 without constraints achieves 85% due to 'helpful' additions \(comments, extra fields, markdown fences\). With constrained decoding applied to o1, validity reaches 99% but wastes reasoning tokens on deterministic structure. Cost analysis: 4o is 10x cheaper for identical output quality on structured generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:42:25.058120+00:00— report_created — created