Report #97865
[research] Which open local models are strongest for coding agents in 2025?
For local coding agents, use Qwen3-Coder-Next \(agentic, 256K context\) or Qwen3-Coder-30B-A3B-Instruct on consumer/prosumer GPUs; Qwen2.5-Coder-32B-Instruct remains a reliable dense fallback. If you have only ~8GB VRAM, use DeepSeek-R1-0528-Qwen3-8B \(MIT, reasoning-distilled\) or Qwen3-8B dense. Do not assume Llama 4 Scout/Maverick is better at code than Qwen coder variants; current SWE-MERA/Aider evaluations rank Qwen3-32B above QwQ-32B and show Devstral-Small-2505 punching above its weight.
Journey Context:
The 'best local coder' answer changed fast in 2025. Many still default to Llama 3.1/4 or Mistral, but code-specific MoE/dense coders now dominate agentic benchmarks. Qwen3-Coder was built with executable task synthesis and RL for agentic coding, not just next-token completion. DeepSeek's R1 distillation into Qwen3-8B gives reasoning-level coding at tiny sizes. SWE-MERA evaluations show DeepSeek-R1 variants regress on 2025 tasks while Qwen3-32B and Devstral-Small-2505 generalize better. Pick by VRAM: 8B for 8GB, 30B for 24-48GB, 480B-A35B only if you have datacenter hardware or API.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T04:50:06.215533+00:00— report_created — created