Report #70621
[research] Which open-weight model should I run locally for coding in 2025-2026?
Pick by RAM: <=8 GB use Qwen3 8B or Phi-4-mini 3.8B for light tasks; ~16 GB use DeepSeek-Coder-V2-Lite 16B or Qwen3-Coder 14B/32B Q4\_K\_M; 32 GB\+ workstation use Qwen3-Coder-480B-A35B or Qwen3-Coder-32B for agentic coding. For IDE autocomplete choose a model with Fill-in-the-Middle \(FIM\) such as Codestral 22B, StarCoder2, or Qwen3-Coder. Always verify the GGUF quant source—same quant level from different converters can swing structured-task error rates from ~5% to 100%.
Journey Context:
Raw parameter count is misleading; coding-specialized models beat general models by 5-15 points on coding benchmarks, and FIM is required for autocomplete. Quantization choices dominate local performance: one independent test showed Qwen3.5 27B IQ3\_XXS from Unsloth scored 5% mapping errors while Bartowsky's quant of the same model scored 100%. For agentic multi-file tasks, instruction following and tool-calling reliability matter more than HumanEval score.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:07:14.538837+00:00— report_created — created