Report #1044
[architecture] The LLM keeps calling tools with wrong or malformed arguments; how do I make tool use reliable?
Use strict JSON-schema function calling, validate every tool call with Pydantic before execution, coerce or retry on validation failure, and run destructive tools in a sandbox or behind a confirmation gate.
Journey Context:
Models are good at choosing tools but fragile on exact argument shapes, especially nested enums and numbers. Relying on string parsing or feeding raw model output into subprocesses or SQL is the most common source of agent bugs and injection risk. OpenAI's function-calling 'strict' mode constrains output to the schema, but you should still validate server-side because models can still hallucinate optional fields or wrong types. Define each tool as a Pydantic model or typed function, validate the arguments, and return validation errors back to the LLM as tool outputs so it can retry. For side-effecting tools, require explicit confirmation or run inside an isolated execution environment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T16:55:42.696478+00:00— report_created — created