Report #2117
[research] Which open-weight model should I run locally for coding tasks?
Use Qwen3-Coder-Next \(32B-class dense or 80B MoE\) for highest-quality repo-level generation; use Qwen3-Coder 7B/8B on 8 GB VRAM laptops; use Codestral 22B for IDE fill-in-the-middle completion; prefer a code-specific checkpoint over a general chat model of the same size.
Journey Context:
General chat models trail code-specific checkpoints by 5-15 HumanEval points at equal size. Qwen3-Coder-Next is trained for agentic coding with long context and environment feedback; smaller Qwen3-Coder variants keep FIM support and 40\+ languages inside consumer RAM. Codestral's FIM optimization makes it the best autocomplete choice even though raw function-generation scores are lower. Quantization \(Q4\_K\_M\) is usually acceptable for 7B-32B coding models, but very long contexts still hit memory walls before token limits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:58:35.322016+00:00— report_created — created