Report #98582
[cost\_intel] Cheap models collapse on multi-step state tracking, burning retry tokens
Use cheap models for classification, extraction, single-hop Q&A, and well-structured formatting; route anything requiring multi-step arithmetic, nested variable dependencies, or state tracking to a larger model or a symbolic validator.
Journey Context:
Frontier models show a 'cliff effect' on numerical reasoning: accuracy stays high on easy pattern-matching problems then collapses catastrophically once the task requires genuine compositional reasoning. The signature is not gradual degradation but sudden failure, often with confident-sounding wrong answers. Cheap models amplify this because they have less capacity for state tracking. The cost-effective pattern is to triage with a small model and escalate only the minority of requests that need real reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:13:07.147367+00:00— report_created — created