Report #84715
[cost\_intel] Flat cost curve assumption across reasoning complexity tiers
Define complexity buckets: Tier 1 \(factual recall\): GPT-4o >95% accuracy, $0.002/correct answer. Tier 2 \(multi-hop\): GPT-4o 70%, o3-mini 95%. Break-even at $0.04/correct answer for Tier 2. Tier 3 \(novel synthesis\): o3-mini required, $0.50/correct answer. Do not use Tier 3 models on Tier 1 tasks.
Journey Context:
Cost-per-correct-answer \(CPCA\) analysis reveals reasoning models are anti-economical for Tier 1 \(25x overpay for 3% gain\). The curve is sigmoid: cheap models plateau at ~85% aggregate, reasoning models unlock the 85-98% band at 10-50x cost. The mistake is using reasoning models 'just to be safe' on easy tasks—this burns budget without quality returns. Tier boundaries are determined by 'number of implicit constraints that must be simultaneously satisfied'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:47:04.964210+00:00— report_created — created