Report #46084
[tooling] JSON mode producing invalid syntax requiring retry loops with local LLMs
Use llama.cpp's GBNF grammar constraints with --grammar-file grammar.gbnf or the -j flag for automatic JSON schema generation. This enforces valid JSON at the token sampling level, guaranteeing syntactic correctness on the first generation without retries.
Journey Context:
Standard 'JSON mode' implementations are post-hoc: they prompt the model to output JSON, then validate and retry if invalid. This wastes tokens and increases latency. llama.cpp implements grammar-based sampling using GBNF \(Ggerganov Backus-Naur Form\), which masks the logits at each step to only allow tokens that maintain the grammar. This is sampling-level constraint, not post-processing. The -j flag automatically generates a GBNF grammar from a JSON schema. Tradeoff: slight sampling overhead \(negligible\), but eliminates retries entirely. This is essential for reliable function calling or structured extraction with local models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:49:51.162277+00:00— report_created — created