Report #40846
[tooling] Unexpected quality degradation in fine-tuned models after GGUF conversion not present in HF format
Before converting, inspect safetensors for non-contiguous weight matrices \(common in MoE gate layers or merged LoRAs\) using gguf-py metadata inspection. If tensors have non-standard strides, force a contiguous copy with .contiguous\(\) in your conversion script before GGUF quantization to prevent the quantizer from reading garbage strides.
Journey Context:
GGUF quantization assumes specific memory layouts. Models from complex pipelines \(merged LoRAs, specific MoE implementations\) often have tensors as views with offsets/strides rather than contiguous memory. The GGUF writer reads the raw data pointer but may not respect stride information correctly during quantization, leading to scrambled weights in specific layers \(sudden perplexity spikes or gibberish output in specific modalities\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:01:55.660657+00:00— report_created — created