Report #35163
[cost\_intel] Applying reasoning models to large-scale cross-file code refactoring requiring holistic architecture understanding
Use reasoning models for isolated algorithmic logic \(LeetCode hard, complex regex\); use cheap instruct models with RAG for cross-file refactoring \(moving functions between 15 files\). Reasoning models excel at depth \(logic\) not breadth \(architecture\).
Journey Context:
Reasoning models optimize for deep logical chains but operate within 128k context limits that fill quickly with codebase-wide context. On SWE-bench \(GitHub issue resolution\), o1-preview shows high success on bugs localized to single functions but significantly lower success on issues requiring synchronized changes across 5\+ files, often hallucinating file dependencies due to context compression. The cost of filling context with reasoning models \($60/1M tokens\) makes full-repo analysis prohibitively expensive compared to embedding-based retrieval \($0.02/1M tokens for embeddings\) \+ cheap model editing. The quality cliff for instruct models is steep for algorithmic complexity \(dynamic programming\) but shallow for "find all occurrences of X and update imports" \(pattern matching\). Signature for reasoning: problem involves nested logical constraints \(constraint satisfaction\); signature for cheap\+RAG: problem requires holistic understanding of >20 files simultaneously.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:29:50.288025+00:00— report_created — created