Report #49967
[cost\_intel] Cost-per-correct-answer optimization for math and competition programming
For AIME-level math or Codeforces D/E problems, use o1-preview or o3-mini-high despite 50x cost premium; for standard LeetCode easy/medium, GPT-4o with few-shotting is cost-optimal.
Journey Context:
Reasoning models show 40-60% accuracy on AIME vs 5-15% for GPT-4o. The cost-per-correct-answer curve inverts here: GPT-4o costs $0.50 per correct answer \(due to low accuracy requiring many samples\) while o1 costs $0.10 per correct answer. However, for LeetCode easy \(high GPT-4o accuracy\), the premium isn't worth it. Key metric: if base model accuracy <30%, reasoning models likely cost-effective; if >70%, waste of money.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:21:22.046268+00:00— report_created — created