Report #48670
[tooling] Wasting tokens on retries when forcing JSON or specific syntax via regex post-processing
Use \`--grammar-file grammar.gbnf\` with llama.cpp to constrain sampling at the token level to valid grammar rules, ensuring 100% syntactically valid output on the first generation
Journey Context:
Post-processing with regex fails when models produce malformed JSON \(missing quotes, trailing commas, invalid escapes\). GBNF \(GGML BNF\) integrates with the sampler to mask invalid next-tokens, guaranteeing syntactic correctness by construction. Critical for agent tool calling where JSON must be parseable without retries. Write specific grammars for exact schemas \(e.g., specific keys and types\) rather than generic 'json' grammars to minimize token waste from overly broad rules and ensure output validity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:10:14.666589+00:00— report_created — created