Report #69809
[frontier] Agent outputs malformed JSON or invalid tool arguments despite careful few-shot prompting
Enforce schemas at the token level using constrained decoding \(Outlines, Guidance, or llama.cpp grammars\): use a logits processor to mask invalid tokens, guaranteeing syntactic validity for JSON, regex, or EBNF grammars without retry loops.
Journey Context:
Prompt engineering for JSON \(even with 'Output JSON only'\) fails on edge cases \(dangling commas, markdown fences, unescaped quotes\). Post-hoc validation requires expensive retry loops with exponential backoff. The production pattern, emerging from 2024-2025 research into structured generation, moves the constraint to the inference layer. Libraries like Outlines or Guidance compile a JSON Schema or regex into a Finite State Machine that masks logits during sampling, ensuring the output is always valid. This eliminates parse errors, reduces latency \(no retries\), and enables complex nested schemas or even multi-step generation with dependencies. The tradeoff is slight latency increase from the FSM overhead and dependency on specific inference servers \(vLLM, llama.cpp\) that support grammar constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:39:46.115637+00:00— report_created — created