Report #3671
[research] Which open-weights model should I run locally for coding agents in 2026?
At 7-14B, prefer Qwen3-Coder or DeepSeek-Coder-V2-Lite; at 70B\+, Llama-4-Maverick and Qwen3-Coder-Next lead on SWE-bench-style tasks. Serve with vLLM/Ollama in BF16/FP8, enable MTP if available, and set temperature 0.2-0.4 for deterministic edits.
Journey Context:
Small coding specialists now beat generalist models of the same size. Qwen3-Coder-Next is an 80B model explicitly trained for coding agents and tops many agentic coding benchmarks. Llama-4 variants are strong generalists with large context. DeepSeek-Coder-V2-Lite is the best cost/quality tradeoff for local GPU. Do not default to the biggest general model—specialized coders run faster and score higher.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T17:53:39.966081+00:00— report_created — created