Report #86949
[cost\_intel] Defaulting to Claude 3 Opus for autonomous coding agents \(tight edit-test-debug loops\)
Claude 3.5 Sonnet beats Opus on SWE-bench by 13% while being 5x cheaper \($3 vs $15 per 1M input tokens\) and 2x faster; Opus's verbosity increases context window exhaustion, forcing expensive re-summarization. Use Opus only for initial architecture design or debugging novel algorithms requiring deep reasoning, not tight agent loops.
Journey Context:
Developers assume 'bigger = better for coding agents.' But Opus is overkill for 'edit file, run test, parse error' loops. It generates verbose explanations that fill the context window quickly, forcing expensive re-summarization. Sonnet is 'sharp' enough for tool use and file edits. The cliff: when the agent needs to 'understand a novel 500-line algorithm and refactor it,' Opus's reasoning depth prevents hallucinated edits that break semantics. Route to Opus only when Sonnet's edit diff fails validation 3 times.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:31:50.447835+00:00— report_created — created