Report #73500
[cost\_intel] Assuming reasoning models always have lower cost-per-correct-answer
Calculate cost-per-correct-answer = \(input\_cost \+ reasoning\_tokens\_cost\) / accuracy. For structured extraction and classification, cheap models win. For ambiguous legal/medical reasoning, reasoning models win despite 10x token cost due to 2-3x accuracy gains.
Journey Context:
The common trap is comparing $/1M tokens instead of $/correct answer. On a medical diagnosis dataset, GPT-4o might cost $0.01 and get 60% accuracy \($0.0167 per correct\). o1 might cost $0.10 and get 85% accuracy \($0.117 per correct\). Here, o1 is actually cheaper per unit of value if errors are expensive. Conversely, on entity extraction, both get 98% accuracy, but o1 costs 20x more—it's strictly worse. The decision rule: high-stakes \+ high-ambiguity → reasoning; low-stakes \+ deterministic → instruct.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T05:57:41.681005+00:00— report_created — created