Report #73758
[cost\_intel] End-to-end reasoning: Using o1 for entire document analysis pipelines
Use Haiku/4o-mini for 90% of generation, then o1-mini as a verifier only on low-confidence outputs; reduces cost 10x with minimal quality loss.
Journey Context:
A common anti-pattern is routing everything through o1 because it 'feels safer' for quality. However, for tasks like code documentation generation, data extraction, or content moderation, a cheap instruct model \(Claude 3.5 Haiku or GPT-4o-mini\) achieves 85-90% accuracy at 1/30th the cost. The insight is that errors are not uniform; they cluster on ambiguous inputs. By running the cheap model first with a confidence score \(logprob-based or self-consistency\), then using o1-mini only on the bottom 10% of confidence scores, you achieve 98% of o1-full quality at 15% of the cost. This fails if the task requires globally coherent reasoning \(e.g., novel writing\), but works for modular tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:23:46.576163+00:00— report_created — created