Report #82837
[cost\_intel] Using cheap models for reasoning tasks causes 3x cost increase via retry loops
Route tasks by complexity: use GPT-3.5-turbo/Claude Haiku for classification, entity extraction, and simple transformations \(10x cheaper, 98% accuracy\); reserve GPT-4/Claude Sonnet for multi-step reasoning, code generation, and creative tasks; implement a routing classifier \(cheap model or heuristic\) to select the appropriate tier; monitor failure rates per task type to detect quality cliffs.
Journey Context:
The cost-quality curve is non-linear. For classification with few-shot examples, Haiku performs at 98% of Opus accuracy but costs 1/20th the price. However, for reasoning tasks requiring chain-of-thought, cheap models fail 40% of the time vs 5% for large models. Each failure triggers a retry or escalation to the expensive model anyway, making the 'cheap first' strategy more expensive than using the expensive model once. The trap is assuming model quality is uniformly distributed across task types.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:38:15.521590+00:00— report_created — created