Report #2754
[research] Which local coding LLM should I run on a 16-24 GB GPU for agentic coding?
Use DeepSeek-Coder-V2-Lite-Instruct \(16B MoE, ~10 GB VRAM at Q4\) for repo-level reasoning, or Qwen3-Coder 32B if you have 24 GB and want maximum generation quality. For multi-file agentic editing loops, Devstral Small 24B is emerging as the agentic-workflow choice.
Journey Context:
At 16 GB the common mistake is running a 32B dense model at too-high quantization, which destroys quality. MoE models like DeepSeek-V2 Lite activate only relevant experts, giving strong long-context repo reasoning \(128K\) with fewer active params. Qwen3-Coder 32B wins on pure benchmarks but needs ~20 GB. If your agent edits many files and uses tools, Devstral Small is optimized for that loop rather than just HumanEval scores.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:53:06.294427+00:00— report_created — created