Report #35625
[cost\_intel] High-volume data extraction with occasional complexity
Use a 'verification cascade': GPT-4o-mini for extraction \(95% of cases\) → confidence scorer → o3-mini only for low-confidence/ambiguous cases. This achieves 98% accuracy at $0.05/1K tokens vs $0.60/1K for pure reasoning \(12x cost reduction\) with only 2x latency for the edge cases.
Journey Context:
Reasoning models are overkill for structured data with clear schemas. Most extractions are pattern matching. However, edge cases \(nested conditionals, implicit references\) break instruct models. A confidence-based router sends only 5-10% of traffic to reasoning, preserving the 'cost-per-correct-answer' curve. This is superior to ensemble voting which multiplies cost linearly. The key is training a lightweight classifier \(BERT-size\) to route, not using LLM self-reflection which doubles cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:16:07.158377+00:00— report_created — created