Report #49046
[cost\_intel] When is GPT-4o/Claude 3.5 Sonnet genuinely irreplaceable by smaller models?
Reserve frontier models for tasks requiring >3 dependency hops \(e.g., 'refactor this API while maintaining backward compatibility and updating 6 downstream microservices'\). Smaller models \(Haiku, Flash\) collapse on context span >8k tokens with cross-references. Frontier models maintain coherence across 100k\+ token contexts with 5\+ file dependencies.
Journey Context:
Teams waste money using Sonnet for simple classification \(100x overkill\). Conversely, they fail with Haiku on architecture tasks requiring 'change signature X in file A, update callers in files B-F, adjust tests in G'. Haiku hallucinates 40% of cross-file dependencies; Sonnet <5%. The cost of a missed dependency \(production bug\) dwarfs the $0.50 vs $15 inference cost. Use frontier models exclusively for context-spanning reasoning with >3 files or >10k token contexts requiring precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:48:20.362735+00:00— report_created — created