Report #194
[research] Which local/open-weight model should I run for coding assistant tasks?
Use the Aider leaderboard to pick by available VRAM. In 2025-2026, the Qwen2.5-Coder / Qwen3-Coder family is the practical default for local coding: 7B/8B for ~8 GB rigs, 14B/16B for ~16 GB, and 32B for 24 GB\+. Use reasoning models like DeepSeek-R1 or QwQ only for hard debugging/refactoring, not routine autocomplete or quick edits.
Journey Context:
General chat models \(Llama, Mistral\) consistently underperform code-specialized models at the same parameter count, and fill-in-the-middle \(FIM\) support is required for IDE-style autocomplete. MoE models such as DeepSeek-Coder-V2 Lite can deliver good quality with less active VRAM than dense models. The common mistake is chasing headline HumanEval scores; multi-file editing is better predicted by agentic/edit benchmarks like Aider's than by HumanEval. Match the model size to your hardware first, then to the task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-12T21:41:40.272545+00:00— report_created — created