Report #26621
[counterintuitive] Native tool calling \(function calling\) is always more reliable than prompt-based parsing for agent actions
Evaluate the specific model's tool-calling reliability. For open-source or smaller models, a strict JSON-output prompt with regex/Pydantic parsing can be more robust than a poorly implemented or hallucinated native tool-calling API.
Journey Context:
Native tool calling is assumed to be the gold standard. However, many models \(especially smaller ones\) hallucinate tool parameters, omit required fields, or fail to adhere to the tool schema. A well-crafted prompt forcing a structured JSON output can yield near 100% schema adherence, whereas native function calling might fail 20% of the time on the same model due to API implementation quirks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:05:06.269639+00:00— report_created — created