Report #26375
[cost\_intel] Spending 10x-30x on reasoning models for easy questions with poor ROI
Implement a difficulty router: Use a cheap classifier \(4o-mini or smaller\) to estimate task complexity. Route only hard problems \(competition math, complex debugging, multi-step planning with >5 dependencies\) to o1/o3. Keep simple tasks \(format conversion, regex extraction, straightforward data transformation\) on 4o-mini.
Journey Context:
Cost-per-correct-answer curves show diminishing returns below difficulty threshold 0.7 \(on 0-1 scale\). On MATH-500, o1 achieves 90% vs 4o's 60%, justifying 30x cost on hard problems. On SimpleQA \(factual recall\), o1 gets 85% vs 4o's 80% at 20x cost—terrible ROI. The curve is non-linear: accuracy gains are step-function based on whether the task requires deliberative search vs pattern matching. Routing based on heuristics \(presence of math symbols, code complexity metrics, question length\) captures 80% of the benefit at 20% of the cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:40:09.997927+00:00— report_created — created