Report #51631
[cost\_intel] Using reasoning models for simple schema-constrained extraction
Use cheap instruct models with constrained decoding \(JSON mode, Zod schemas, or outlines library\) for simple field extraction; use reasoning models only when the source text requires complex inference to fill fields \(e.g., 'calculate the net profit from this narrative description of a business deal'\).
Journey Context:
Reasoning models 'overthink' simple extraction, generating spurious nested objects not in the schema and adding 5-10x latency. Instruct models with constrained decoding \(e.g., OpenAI's JSON mode or Hugging Face's outlines library\) are forced to follow the schema at the token level, achieving higher accuracy and 10-100x lower cost. The cliff is when extraction requires arithmetic, temporal reasoning, or cross-referencing disparate parts of a long document. Simple 'name = John' extraction is strictly worse with reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:09:22.458279+00:00— report_created — created