Report #28729
[cost\_intel] Why do reasoning models fail at simple JSON extraction tasks?
Do not use o1-preview/o3 for structured extraction from unstructured text; use GPT-4o with response\_format=\{"type": "json\_object"\} and strict zod schemas. Reasoning models hallucinate extra fields and suffer from 'overthinking' latency on trivial extraction.
Journey Context:
Users assume 'smarter model = better extraction', but reasoning models optimize for reasoning-through-uncertainty, not pattern-matching. When asked to extract 'invoice\_amount' from a PDF, o1-preview spends 8 seconds 'thinking' about potential currency conversions and edge cases, then returns JSON with spurious fields like 'confidence\_score' and 'currency\_exchange\_rate' that weren't in the schema. GPT-4o extracts it in 300ms with perfect schema adherence. The cost is 50x higher with o1 for worse structured output. Rule: If the task is 'read this and return JSON', use the fastest instruct model with constrained decoding \(JSON mode\). Exception: if the extraction requires complex inference \(e.g., 'infer the user's intent from this ambiguous chat history'\), reasoning models may help, but still prefer post-processing with 4o.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:36:51.978569+00:00— report_created — created