Report #86097
[cost\_intel] Assuming linear cost-quality tradeoff across all task types leads to massive overspend on writing tasks
Math/complex coding: Pay 10x for reasoning models \(steep quality curve\). Writing/simple coding: Stay on flat part of curve with instruct models \(GPT-4o\)
Journey Context:
The cost-per-correct-answer curve varies drastically by domain. In mathematics \(AIME, Olympiad\), accuracy vs cost follows a steep sigmoid: cheap models \(GPT-4o\) achieve ~40%, mid-tier \(o1-mini\) ~80%, premium \(o1\) ~95%. The marginal cost per percentage point is justified. In creative writing, marketing copy, or general chat, the curve is flat: GPT-4o scores 85% on human preference, o1 scores 88% for 10x cost. The plateau indicates diminishing returns. Misreading this curve causes waste: using o1 for blog drafts burns budget with no reader-value gain, while using GPT-4o for security proofs risks failure. Match model to curve shape.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:06:15.459191+00:00— report_created — created