Report #49040
[cost\_intel] How do I calculate the true cost-per-correct-answer when choosing between model tiers for knowledge work?
Calculate: \(Input tokens × Input price \+ Output tokens × Output price\) / \(Accuracy %\). For GPQA-diamond level tasks, o3-mini achieves $0.15/correct answer vs GPT-4o's $2.40/correct \(16x cheaper\) due to 90% vs 40% accuracy. For simple QA, 4o-mini is $0.001 vs o1's $0.50 per correct answer.
Journey Context:
Raw price per token is misleading; accuracy dominates the denominator. On hard reasoning tasks, cheaper models have near-zero accuracy, making their effective cost infinite. Conversely, on easy tasks, reasoning model overkill creates 100x cost inflation. The inflection point is where 4o accuracy drops below 70% - that's where reasoning becomes cost-effective despite 10x price.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:48:05.652411+00:00— report_created — created