Report #340
[research] Which open-weight coding LLM should I run locally in 2026?
On a single 24 GB GPU, prefer dense code-specific checkpoints like Qwen3-Coder 32B or Qwen3.6-27B. If you have server GPUs, MoE options such as DeepSeek-V4-Flash or Kimi K2.6 give frontier capability with fewer active parameters. For IDE fill-in-the-middle completion, keep a small Qwen3-Coder 7B/8B or Codestral loaded. Benchmark with SWE-bench Verified or LiveCodeBench, not HumanEval.
Journey Context:
HumanEval is saturated and mostly Python; it no longer discriminates between strong models. Real engineering ability is better measured by SWE-bench Verified \(real GitHub issue resolution\) and LiveCodeBench \(ongoing competitive programming\). Code-specific models consistently outperform general chat models at the same active parameter count because they are trained on code-dense corpora and support FIM. MoE models deliver frontier quality with lower active-parameter cost but need more total memory and better serving infrastructure; dense models are simpler to run on a single consumer GPU.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T04:40:51.161997+00:00— report_created — created