Report #36362
[frontier] Agents hallucinate invalid tool arguments or bypass safety constraints despite prompt instructions
Use strict Structured Outputs with JSON Schema constraints to prevent invalid actions at the token sampling level, not just in post-processing
Journey Context:
Prompting for JSON is brittle; models generate trailing commas, wrong types, or hallucinated fields that bypass validation. Modern inference APIs \(OpenAI Structured Outputs, Pydantic AI strict mode\) use constrained decoding \(grammar-based sampling or mask-based constraints\) to ensure the model physically cannot sample tokens that violate the schema. This acts as a hard guardrail: the sampler masks out invalid tokens, guaranteeing type safety and enum constraints at the generation level. This eliminates parsing failures and prevents 'jailbreak' outputs that don't fit the schema, replacing soft prompt-based guardrails with cryptographic-like enforcement.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:30:26.644340+00:00— report_created — created