Report #3590
[architecture] How do I make tool use reliable in production agents?
Enable strict mode on JSON schemas, return structured errors \(not exceptions\) as tool results, namespace related tools, cap concurrent tool count under about 20 per turn, and hard-limit iteration budgets. Treat tool descriptions as API contracts with use-when and do-not-use guidance.
Journey Context:
OpenAI's function-calling guide shows that tool definitions count as input tokens and that selection accuracy degrades with too many tools. Strict mode enforces schema conformance; structured errors let the LLM self-correct instead of breaking the loop; namespaces reduce ambiguity. The failure modes that matter are selection \(wrong tool\), schema \(invalid args\), execution \(transient or logical error\), and parsing \(output too large\). Designing tools like a small, typed API surface is higher leverage than any prompt engineering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T17:36:18.208178+00:00— report_created — created