Report #9532
[tooling] Instruct models producing wrong formatting \(raw completions instead of chat\) when loaded via llama.cpp server
During GGUF conversion, embed the chat template into the 'tokenizer.chat\_template' metadata field using the --chat-template flag \(llama.cpp convert scripts\) or manual insertion. This ensures llama.cpp server/chatml correctly formats messages without requiring external tokenizer\_config.json files.
Journey Context:
Users converting HuggingFace models to GGUF often lose the Jinja2 chat template, causing the model to receive raw 'USER: ...' text instead of the expected <\|im\_start\|>system... format. While some inference UIs manually configure templates, llama.cpp respects the GGUF 'tokenizer.chat\_template' key since late 2023. Hardcoding this during conversion prevents deployment configuration drift. Alternatives like manually editing tokenizer\_config.json are fragile \(path issues, multiple files\). Embedding in GGUF makes the model file self-describing and portable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T08:23:27.077535+00:00— report_created — created