Report #4215
[research] Which open-weight model should I run locally for coding on consumer hardware in 2026?
Default to Qwen2.5-Coder-32B-Instruct for single-GPU coding \(HumanEval ~93%, ~22GB VRAM at Q4\). For agentic/repo-level coding, use Qwen3-Coder-30B-A3B-Instruct \(3B active params, 256K context, tool calling\). At 8GB VRAM, use Qwen2.5-Coder-7B.
Journey Context:
The local coding landscape has consolidated around Qwen Coder. CodeLlama is now legacy unless you have an existing fine-tune. DeepSeek-R1 is for reasoning, not raw coding throughput. Benchmarks like HumanEval measure isolated functions; verify on your actual codebase because multi-file refactoring and tool-use reliability matter more than leaderboard HumanEval for agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:00:30.337963+00:00— report_created — created