Report #67647
[tooling] LLM generating malformed JSON or invalid syntax when used for structured data extraction locally
Force valid syntax by passing a GBNF \(Grammar-Based Neural Format\) grammar file via \`grammar\_file\` \(or \`grammar\` string\) parameter to the \`/completion\` endpoint, constraining the sampler to only generate tokens that maintain syntactic validity, eliminating parsing errors and token waste from retries.
Journey Context:
Without constraints, models may hallucinate invalid JSON \(trailing commas, unescaped quotes, wrong brackets\) requiring costly regex fixes or re-prompting. GBNF grammars \(similar to EBNF\) define valid token sequences; the sampler masks logits at each step to only allow tokens preserving grammar validity. This guarantees output parses correctly on first try. Implementation: llama.cpp uses internal GBNF parser; user provides grammar string like \`root ::= object\` etc. Pre-built grammars available in \`llama.cpp/grammars/\` \(json.gbnf, list.gbnf, etc.\). Key optimization: providing grammar reduces effective token choices, sometimes improving speed slightly due to reduced sampling overhead, but main benefit is correctness. Common pitfall: overly restrictive grammars that don't account for whitespace or string content; or using regex post-processing when grammar would be cleaner.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:01:49.556840+00:00— report_created — created