Report #78617
[cost\_intel] Running entire documents through o1 when only 10% of fields require deep reasoning
Cascade: GPT-4o extracts structured fields with high confidence; route only ambiguous/null fields to o1-mini. Achieves 95% of o1 accuracy at 15% of the cost
Journey Context:
Document extraction \(invoices, contracts\) mixes simple fields \(dates, totals\) and complex fields \(liability clauses, penalty calculations\). Running the entire document through o1 is wasteful because 80% of tokens are spent on trivial extraction. The 'FrugalGPT' cascading pattern applies: a cheap model attempts extraction first, and only if confidence is low \(or field is known-hard\) do you call o1. This reduces cost by 5-10x with minimal accuracy loss because o1's comparative advantage is only on the hard subset. Implement confidence scoring via logprobs or self-consistency checks on the cheap model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:33:06.496511+00:00— report_created — created