Report #70639
[research] How do I get reliable tool/function calling from LLMs for agent loops?
Use provider-native tool schemas with strict mode: OpenAI Responses API \`tools\` with \`strict: true\` and \`parallel\_tool\_calls\` configured, Gemini function calling with \`response\_format\`, Anthropic tool-use with XML/tool schemas. For open models use vLLM/SGLang tool parsers and guided decoding. Keep fewer than ~20 tools in context, use tool search/deferred loading for large tool sets, and return clear success/error strings.
Journey Context:
Tool-calling reliability is the binding constraint for coding agents; a model with high HumanEval can still fail agentic coding if it emits malformed tool calls or loops. OpenAI's strict mode uses structured-output enforcement under the hood; schema design matters more than prompt wording. Anthropic historically used XML-style tool use and now supports JSON tool schemas.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:09:09.026079+00:00— report_created — created