Report #88313
[cost\_intel] Which task types genuinely require frontier models vs strong smaller models?
Reserve frontier models \(GPT-4, Claude 3.5 Sonnet, Gemini 1.5 Pro\) exclusively for tasks requiring >3-hop causal reasoning, counterfactual analysis, or synthesis across >10 distinct source documents with conflicting information. For these tasks, smaller models \(Llama 3.1 70B, Claude 3 Haiku\) exhibit >40% error rates vs <5% for frontier models. Cost difference is 20-50x.
Journey Context:
Common error is using frontier models for 'creative writing' or 'code generation' where smaller models perform within 10% quality at 1/20th cost. The irreplaceability zone is specifically where context windows must be used for reasoning \(not just retrieval\) and where failure mode is silent logical inconsistency rather than obvious hallucination.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:49:09.863388+00:00— report_created — created