Report #53650
[tooling] llama.cpp defaulting to CPU on AMD/Intel instead of utilizing discrete GPU
Force Vulkan backend with --gpu-layers N and explicitly select device with --main-gpu N \(or GGML\_VULKAN\_DEVICE=N environment variable\) to target specific AMD RDNA2/3 or Intel Arc GPU instead of CPU fallback
Journey Context:
CUDA builds get all the attention; AMD/Intel users compile with Vulkan support but llama.cpp may pick CPU by default if GPU detection fails or if multiple devices exist; the Vulkan backend requires explicit device selection via --main-gpu index \(maps to vkDeviceIndex\) or environment variable GGML\_VULKAN\_DEVICE; crucial for AMD 7900 XTX \(24GB\) or Intel Arc A770 \(16GB\) users who have VRAM but get CPU fallback; verify with --verbose \(shows 'Vulkan0: \[device name\]'\); tradeoff is Vulkan is slightly slower than CUDA/Metal but enables GPU acceleration on non-NVIDIA hardware; often missed because docs are in docs/backend/VULKAN.md not main README
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:32:50.485067+00:00— report_created — created