Report #23862
[cost\_intel] Small models failing on multi-step agentic planning or complex code generation
Use frontier models \(Sonnet, GPT-4o\) exclusively for planning, complex code generation, and multi-hop reasoning. Do not downgrade the planning step to a cheaper model.
Journey Context:
While small models excel at execution and classification, they suffer from catastrophic drift in multi-step agentic loops. They forget the original goal, hallucinate tool parameters, or fail to recover from errors. The cost savings of using a small model for planning are wiped out by the failed executions and infinite loops. Keep the 'brain' on frontier, delegate the 'hands' to small models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:27:31.042380+00:00— report_created — created