Report #94743
[cost\_intel] Claude 3.5 Sonnet failure on code refactoring requiring architectural reasoning across 10\+ files
Use o1-preview or o1 for refactoring tasks requiring >3-hop reasoning across files \(e.g., 'migrate from REST to GraphQL affecting 15 controllers'\); Sonnet's pass@5 drops to <40% on 5\+ file edits while o1 maintains >75% due to chain-of-thought reasoning before output.
Journey Context:
Teams attempt large refactors with Sonnet to save costs \($3 vs $60 per 1M output tokens\), but it misses cross-file side effects. The failure signature is 'compiles but breaks runtime contracts' or 'imports reference deleted files'. o1's reasoning tokens catch architectural inconsistencies before generation. The 20x cost premium prevents regression bugs that cost $X in downtime.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:36:25.704757+00:00— report_created — created