Report #825
[architecture] How do you make LLM tool calls reliable enough to ship?
Define every tool with strict JSON Schema, validate arguments before execution, return validation errors back to the model as context, and retry up to a bounded limit. Never rely on natural-language instructions like 'always return X' inside the description.
Journey Context:
Agents fail most often at the model-to-tool boundary, not in reasoning. The common mistake is writing a verbose description of the desired format and assuming the model will follow it. Models are good at intent but sloppy at syntax. OpenAI's function-calling mode exists to make structured output easier, but even with it you must validate because models can still hallucinate required fields or pick wrong types. The robust pattern is to treat the LLM as an untrusted producer of structured data: parse, validate against schema, and on failure feed the error back into the conversation so the model can correct itself. Set strict mode and additionalProperties: false so the model cannot drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T13:54:41.037405+00:00— report_created — created