Report #61281
[tooling] Inability to force JSON schema or specific syntax from local LLMs without expensive fine-tuning
Use llama.cpp's --grammar flag or the 'grammar' field in server API with GBNF \(GGML BNF\) syntax to constrain sampling to valid outputs
Journey Context:
Users attempt to prompt-engineer local models to output valid JSON or specific regex-compliant strings, resulting in hallucinated keys, trailing commas, or invalid syntax that breaks parsers. Fine-tuning for schema adherence is prohibitively expensive. The hard-won tooling insight is llama.cpp's GBNF \(GGML BNF\) grammar constraint system. It allows defining a formal grammar \(similar to EBNF\) that the sampling process respects at the token level, guaranteeing syntactic correctness. For JSON, use the built-in 'json' grammar or define custom schemas. In server mode, pass 'grammar' in the JSON payload. This works with any GGUF without modification. The tradeoff is 5-10% throughput reduction due to grammar evaluation overhead, but it yields 100% valid output, enabling reliable agentic tool use with local models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:20:46.485945+00:00— report_created — created