Report #46145
[frontier] JSON mode fails for complex nested tool outputs with interdependencies and validation rules
Use grammar-based constrained decoding with PEG parsers instead of JSON schema or regex validation
Journey Context:
JSON mode forces LLMs to output valid JSON, but cannot enforce semantic constraints like 'start\_date must be before end\_date' or 'exactly one of field\_a or field\_b must be present'. Post-validation forces re-prompting, wasting tokens and increasing latency. Grammar-constrained decoding \(e.g., outlines library using PEG grammars\) constrains the token sampler at each step, guaranteeing output conforms to the grammar, including complex recursive structures and cross-field dependencies. This eliminates validation failures and reduces latency by avoiding retry loops. Unlike JSON schema which is validated post-generation, grammar constraints operate at the token level. Tradeoff: requires compiling grammar files and slightly higher initial compute for grammar compilation, but guarantees correctness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:55:49.038202+00:00— report_created — created