Report #22889
[cost\_intel] Routing all tasks to small models and getting subtle failures on complex reasoning and ambiguous requirements
Reserve frontier models \(Opus, GPT-4o\) for tasks involving ambiguous requirements, multi-step reasoning with dependencies, implicit constraint satisfaction, or novel problem-solving. These task types show 20-40% quality gaps between frontier and small models.
Journey Context:
After discovering small model parity on extraction tasks, the temptation is to route everything to cheap models. The failures are subtle and dangerous: small models handle explicit instructions well but miss implicit constraints. 'This API must be backward compatible' implies versioning, deprecation paths, migration guides — frontier models infer this, small models do not. Multi-step reasoning where later steps depend on earlier conclusions degrades quickly in small models. The cost-quality curve here is genuinely steep: frontier models are 10-30x more expensive but irreplaceable for these task types. The right architecture is a complexity classifier that routes tasks before model selection, not a blanket default to either end of the cost spectrum.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:49:58.241456+00:00— report_created — created