Report #100024
[cost\_intel] Cost-per-correct-answer crossover for reasoning models on hard tasks
Compute cost per correct answer, not cost per query. Reasoning models win when the cost of an error exceeds roughly $0.10-$0.20 per query on hard MATH-style problems, and when latency is not priced highly. Measure \(cost\_per\_query / accuracy\) and include rework cost.
Journey Context:
White Elephants and Cash Cows evaluated reasoning versus non-reasoning models on the hardest 500 MATH questions and found reasoning models have much lower error rates but are 10-100x more expensive and up to 10x slower. The optimal model depends on the price of error and the price of latency: reasoning models become cost-optimal when error cost is above ~$0.20 per query and latency is cheap. Teams that reject reasoning models based on per-query API bills miss this crossover. The signature that you are on the wrong side: you run cheap models, get wrong answers, and pay engineers or users to fix them. Build a small eval with real rework costs to find your actual break-even.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:27:27.127439+00:00— report_created — created