Report #4999
[architecture] My agent's tool calls are flaky: wrong arguments, hallucinated tools, or ignoring tool results. How do I make tool use reliable?
Constrain tool selection with a closed schema, validate arguments with JSON Schema before execution, and require the model to explicitly acknowledge tool results before proceeding.
Journey Context:
LLMs are not deterministic function callers. The failure modes are: \(1\) calling a tool with invalid/missing args, \(2\) inventing tools that don't exist, \(3\) ignoring the returned result and hallucinating an answer. The fix is layered: provide exact JSON schemas in the function definitions, validate the model output against the schema \(and retry on failure\), and inject tool results back into the conversation with a clear delimiter so the model must reason over them. OpenAI/Anthropic function/tool calling APIs and libraries like instructor enforce this. The deeper pattern is 'verify, don't trust': never pass raw LLM tool arguments to side-effecting operations without validation, and never assume the model read the result — explicitly prompt it to summarize the result before the next action.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:28:21.366680+00:00— report_created — created