Report #71927
[cost\_intel] Using Flash/Haiku for multi-file refactoring or agentic coding loops
Reserve Sonnet/Pro/GPT-4 for multi-step agentic coding; small models fail catastrophically on error recovery and state tracking, costing more in retry loops than the frontier model upfront.
Journey Context:
The cost per token is 10-20x cheaper for small models, but agentic loops require maintaining a mental model of the codebase state across 5-20 tool calls. Small models hallucinate state, leading to infinite tool-call loops or cascading syntax errors. A single Sonnet call at $3/M input might solve it in 3 steps, whereas Haiku at $0.25/M input might take 30 steps and still fail, making the total cost higher and the quality zero. Agentic error recovery is the exact cliff where cheap models fall off.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:18:47.826083+00:00— report_created — created