Report #92514
[frontier] My agent outputs malformed JSON or hallucinates enum values causing downstream crashes.
Use native constrained decoding \(e.g., OpenAI Structured Outputs, outlines library, or llama.cpp grammars\) to enforce JSON Schema at the token sampling level rather than parsing and retrying malformed outputs.
Journey Context:
Traditional approaches rely on prompt engineering \('output valid JSON'\) plus post-hoc parsing with retry loops. This wastes tokens \(retrying costs 2x latency\) and is nondeterministic. Constrained decoding \(also called 'logits masking'\) modifies the sampling procedure to only emit tokens that maintain syntactic validity against the schema \(e.g., after a '\{', only valid keys are possible\). This guarantees first-try correctness. The tradeoff is you need inference engines that support it \(OpenAI API, vLLM with outlines, llama.cpp\), but you eliminate an entire class of integration bugs and reduce latency by removing retry logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:52:28.646589+00:00— report_created — created