Report #20927
[tooling] Poor performance on Intel Arc A770 or integrated Xe with llama.cpp
Compile with GGML\_SYCL=1 and set ONEAPI\_DEVICE\_SELECTOR=level\_zero:gpu. Use -ngl 33 \(or all layers\). The SYCL backend uses Unified Shared Memory \(USM\) to avoid host-device copies, yielding 2x performance over OpenCL and 3x over Vulkan on Intel Arc.
Journey Context:
Users try OpenCL or Vulkan backends on Intel discrete GPUs and get <5 tok/sec. The SYCL backend \(Intel's standard\) specifically optimizes for Xe architecture using USM \(Unified Shared Memory\), allowing zero-copy access to host memory on discrete Arc GPUs. This requires the oneAPI base toolkit but is the only backend that properly utilizes Intel's matrix engines \(XMX\) for LLM inference. The ONEAPI\_DEVICE\_SELECTOR ensures the Level Zero driver is used rather than OpenCL.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:32:30.708827+00:00— report_created — created