Report #936
[research] Which open-weight model should I pick for local/agentic coding in 2025-2026?
Use the Qwen3-Coder family. For single-GPU self-hosting, Qwen3-Coder-30B-A3B \(30B total / 3B active MoE, 256K context, Apache 2.0\) is the practical sweet spot. If you need maximum capability and can run multi-GPU or use an API, Qwen3-Coder-480B-A35B is the open-weight frontier. For very constrained VRAM, Qwen2.5-Coder-7B or Qwen3-Coder-Next small variants still outperform generic 7B models. Serve via SGLang or vLLM with the Qwen tool parser for correct function calling.
Journey Context:
Developers often default to GPT-4o/Claude because they assume open models lack agentic coding quality and reliable tool use. Qwen3-Coder closes that gap for code specifically: it is trained with a heavy code emphasis, supports long context, and ships with aligned tool parsers. The 30B-A3B MoE is the local sweet spot because sparse activation keeps inference manageable while matching or beating earlier dense models many times larger. The common mistake is using a generic chat model and expecting strong fill-in-the-middle and multi-file edits; code-specific pretraining and tool-parser alignment matter more than raw parameter count for coding agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T14:59:31.049476+00:00— report_created — created