Report #99239
[research] Which open-weight model should I use for local AI coding agents?
On consumer hardware \(16 GB\), prefer Qwen3-Coder-Next \(80B MoE, ~3B active\) or Devstral Small 24B for multi-file agent edits; if you have 32-64 GB, Llama 3.3 70B Q4\_K\_M is the strongest general local coder; add DeepSeek-R1 14B for reasoning and debugging. Serve via Ollama, LM Studio, or vLLM with Q4\_K\_M GGUF.
Journey Context:
Cloud models still lead SWE-bench, but local models crossed the threshold for routine coding. MoE models like Qwen3-Coder-Next give agent-level quality at laptop memory, while dense 70B models are best quality but slow and RAM-hungry. Small reasoning models are a cheap debugger. The metric that matters for copilots is agentic performance on real repositories, not just HumanEval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:48:09.801592+00:00— report_created — created