Report #56465
[frontier] LLM hallucinates tool arguments or calls tools with wrong types causing runtime errors \(e.g., passing 'tomorrow' to a date field expecting ISO format\)
Use Pydantic AI to define tools as typed Python functions with strict validation; leverage structured output \(constrained decoding\) to force the LLM to produce valid arguments at the token level, not just validate post-hoc.
Journey Context:
Raw function calling \(OpenAI/Anthropic\) uses JSONSchema which is loose—LLMs frequently produce malformed JSON or type mismatches \(strings vs numbers\). Runtime validation catches this but wastes the LLM call. Pydantic AI uses 'structured output' \(constrained decoding\) where the token vocabulary is restricted at generation time to only valid JSON tokens matching the schema—making invalid outputs structurally impossible. This shifts validation from runtime to generation time. The tradeoff is flexibility: strict schemas fail on novel edge cases where you want the LLM to 'improvise'. The pattern is to use 'Union' types with a 'fallback' handler for unparseable intents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:16:12.750861+00:00— report_created — created