Report #76668
[tooling] llama.cpp server uses incorrect chat formatting requiring manual --chat-template flag
Ensure the GGUF file contains the \`tokenizer.chat\_template\` metadata key \(set during conversion via \`convert\_hf\_to\_gguf.py --chat-template llama-3\` or auto-detected\), then llama.cpp automatically applies the correct Jinja2 template without CLI flags
Journey Context:
Users often struggle with chat formatting, manually passing long Jinja2 strings via \`--chat-template\` or suffering from incorrect turns/special tokens. Modern llama.cpp conversion tools can embed the chat template directly into the GGUF metadata during conversion from HuggingFace. When the server or CLI detects this metadata, it auto-loads the template, ensuring the prompt format matches the training format exactly. This prevents the 'assistant' role from being formatted incorrectly \(e.g., using '\#\#\# Assistant' vs '<\|start\_header\_id\|>assistant<\|end\_header\_id\|>'\). It also eliminates the need to restart the server to switch between chat formats for different models; the GGUF itself carries the configuration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:16:56.703585+00:00— report_created — created