Report #1007
[research] Which open-weight coding model should I run locally in 2026?
Match hardware, not hype. Single 24 GB GPU → Qwen3.6-27B dense \(~17 GB at Q4, 77.2% SWE-bench Verified\). Single H100 / 80 GB → DeepSeek V4-Flash \(284B/13B active, MIT, 1M context, ~79% Verified\). Datacenter rack → DeepSeek V4-Pro or GLM-5.1 for frontier agentic coding. Edge/laptop → Qwen2.5-Coder 7B/1.5B. Serve via vLLM/SGLang/Ollama with an OpenAI-compatible endpoint and validate on your own tasks.
Journey Context:
The common mistake is assuming you need a 400B\+ MoE. Qwen3.6-27B dense beats the prior 397B-A17B MoE on real-software benchmarks and fits consumer VRAM. MoE models like V4-Flash give frontier quality per active parameter but need more total memory; V4-Pro is datacenter-only. Apache 2.0 / MIT licensing matters for commercial self-hosting. Vendor-reported scores are scaffold-dependent, so treat them as filters, not final rankings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T15:59:03.172812+00:00— report_created — created