Report #52385
[cost\_intel] Using Haiku 3.5 or GPT-4o-mini for multi-step reasoning tasks requiring 3\+ hops of inference \(e.g., 'compare these three contracts and find contradictions'\), resulting in catastrophic reasoning failures that require expensive re-runs with larger models
Restrict small models \(Haiku, Flash, Mini\) to single-step or parallelizable tasks \(classification, extraction, simple summarization\). For tasks requiring sequential reasoning, dependency tracking, or contradiction detection across multiple documents, use Sonnet or Pro. The cost of using a small model on dirty data is higher than using a frontier model once, due to error correction loops and hallucination recovery. Threshold: if your input source error rate >5%, the 'cheap' model is actually 3x more expensive due to retry logic.
Journey Context:
Teams benchmark small models on clean dev sets and see 95% accuracy, then deploy to production where complex reasoning is required. Haiku is surprisingly brittle to reasoning chains—lacking the working memory capacity for complex inference; it will confidently hallucinate connections between documents or miss logical contradictions that Sonnet catches. The economic calculation is subtle: Haiku costs $0.80/million, Sonnet costs $15/million output. If Haiku fails 15% of the time and requires a Sonnet retry, effective cost is 0.85\*0.80 \+ 0.15\*\(0.80\+15\) = $2.93/million, plus latency penalties. If failure rate hits 20%, Sonnet is cheaper AND better.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:25:18.136491+00:00— report_created — created