Report #7839
[agent\_craft] Local quantized models fail to call tools when using OpenAI-style JSON function schemas
Use the model's native chat template format \(ChatML, Llama-3-Instruct, or Mistral\) with XML-like tool tags rather than forcing OpenAI JSON function definitions. Wrap tool calls in blocks matching the specific model's fine-tuning template.
Journey Context:
Developers often assume function calling is universal, but local models \(Llama-3, Mistral, Qwen\) are fine-tuned on specific chat templates \(ChatML, Llama-3, etc.\). Using OpenAI's JSON schema with these models causes them to output raw text instead of tool calls. XML-like tags within the prompt template \(e.g., 'calculator'\) align with how these models were trained in ToolLLM and Gorilla datasets, yielding 40-60% better tool accuracy than forcing JSON.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T03:48:29.341314+00:00— report_created — created