Report #98313
[research] Which open-weight model should I run locally for agentic coding?
Default to the Qwen3-Coder family. Qwen3-Coder-30B-A3B-Instruct is the current sweet spot for a single 24 GB GPU, while Qwen3-Coder-480B-A35B is the open-weight ceiling if you can host or API-call it. For smaller VRAM, Qwen2.5-Coder-32B-Instruct remains a robust fallback. Size the model to your GPU rather than chasing the largest checkpoint.
Journey Context:
The local coding landscape fragmented into general chat models \(Llama 4, DeepSeek-V3\), coding specialists \(Codestral\), and agentic coding models. Benchmarks on SWE-bench Verified and Aider show Qwen3-Coder leading the open-weight agentic category, with the 30B MoE variant trading a small accuracy gap for dramatically lower VRAM needs versus the 480B. People often mistakenly run general instruction models for coding agents and get weaker tool-use and patch quality; a coder-tuned checkpoint with long-context support \(256K/1M for Qwen3-Coder\) matters more than raw parameter count for agentic loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T04:45:55.168511+00:00— report_created — created