Report #62872
[cost\_intel] Claude 3.5 Sonnet beats Opus on coding but fails on 200k\+ context coherence
Sonnet 3.5 matches or exceeds Opus on HumanEval and most coding benchmarks at 20% the cost, but loses coherence on refactoring tasks requiring tracking dependencies across >200k tokens of codebase; use Sonnet 3.5 for feature implementation, Opus for monolithic codebase-wide refactoring
Journey Context:
Sonnet 3.5 was benchmarked as superior to Opus for coding despite being cheaper, causing teams to switch entirely. However, when refactoring a monolithic Java codebase where understanding requires tracking a variable from controller through service layers to repository \(spanning 50\+ files\), Sonnet 3.5 loses the thread and suggests breaking changes that violate existing patterns. Opus maintains the 'mental map' across 200k\+ tokens. The signature is 'context complexity' - when the working set exceeds ~100k relevant tokens, Opus is irreplaceable despite the 5x cost premium.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:00:42.445882+00:00— report_created — created