Report #91638
[cost\_intel] Chaining cheap instruct with reasoning check beats full reasoning pipeline
For document extraction at scale, pipeline GPT-4o-mini to extract fields \(cheap\), then use o1-mini only as a judge on the 5% of rows with low confidence or complex nested logic. This achieves 99% accuracy at 1/15th the cost of running o1 on every document.
Journey Context:
The naive approach is feeding all documents to o1 for 'best quality'. This burns budget on trivial documents. The correct pattern is 'cascading classifiers': a fast cheap model handles the easy 95%, and the expensive reasoning model only verifies the hard 5%. This is the 'LLM-as-a-Judge' pattern applied to extraction. The quality degradation signature is that the cheap model fails on ambiguous nested structures \(e.g., 'Is this address the billing or shipping address when both are listed?'\), which is exactly what o1 catches. Cost drops from $50/1k docs to $3/1k docs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:24:14.659273+00:00— report_created — created