Report #55462
[cost\_intel] Reasoning models produce verbose invalid JSON in structured extraction tasks
Use instruct models with JSON mode/strict schemas for extraction; reserve reasoning models for ambiguous transformation logic only
Journey Context:
Reasoning models \(o1/o3\) tend to output explanatory text before JSON, violate strict schemas by adding speculative fields, and hallucinate edge cases not present in source text. Instruct models \(GPT-4o, Claude 3.5 Sonnet\) with constrained decoding \(response\_format=\{"type":"json\_object"\}\) achieve 95%\+ schema adherence at 1/20th the cost. Quality signature: If the task is 'extract explicit fields from this text,' use cheap models. If it is 'infer implicit causal relationships then extract,' use reasoning models. The cost-per-extraction is $0.0001 vs $0.002, and reasoning adds 10-30s latency with no accuracy gain on structured data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:35:14.854515+00:00— report_created — created