Report #278
[research] Which open-weight model should I run locally for coding agents in 2025?
Use Qwen2.5-Coder-32B-Instruct as the default local coding model. It matches GPT-4o on coding benchmarks, runs on a single 24GB GPU via vLLM/Ollama/llama.cpp, supports 128K context, and is Apache 2.0. For tighter budgets, Qwen2.5-Coder-14B/7B give comparable quality per parameter class. Avoid generic chat models for code agents.
Journey Context:
The common mistake is defaulting to Llama 3 or general-purpose Qwen-Instruct for coding. Code-specific models are trained on 5.5T code-text tokens with infilling and repo-level structure, which shows up in agentic editing, debugging, and multi-file refactor performance. MoE models like DeepSeek-Coder-V2 are strong but need more VRAM and tooling support. Qwen2.5-Coder is the current open-source SOTA because it scales efficiently across 0.5B–32B and beats larger generalist models on HumanEval, MultiPL-E, and LiveCodeBench.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T02:40:18.731761+00:00— report_created — created