Report #88261
[cost\_intel] At what task difficulty does the cost-per-correct-answer invert between instruct and reasoning models?
Calculate the crossover: \(Cost\_reasoning / Accuracy\_reasoning\) < \(Cost\_cheap / Accuracy\_cheap\). For tasks where cheap models achieve <40% accuracy \(hard math, complex debugging, adversarial security analysis\), reasoning models become cheaper per correct answer despite 10x token cost.
Journey Context:
Most users assume expensive models are always expensive per unit of value. This is false for high-difficulty tasks. Example: On a hard coding task, GPT-4o costs $0.01/attempt with 10% accuracy \(cost per correct: $0.10\). o3-mini costs $0.10/attempt with 80% accuracy \(cost per correct: $0.125\). Near parity. But if GPT-4o drops to 5% accuracy \(cost per correct: $0.20\), o3-mini at 80% \($0.125\) becomes cheaper per correct answer. The inflection occurs when the accuracy ratio exceeds the cost ratio. For 'impossible' tasks \(formal verification, competition coding\), cheap models approach 0% accuracy \(infinite cost per correct\), making reasoning models the only economically viable path to any correct answers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:43:51.665168+00:00— report_created — created