Report #78772
[synthesis] Zero-shot tool schemas work for GPT-4o but cause hallucinated parameters in Llama and suboptimal calls in Claude
Provide 1-2 few-shot examples of tool usage in the system prompt for Claude and Llama 3. GPT-4o can operate zero-shot but few-shot improves its accuracy on edge cases.
Journey Context:
OpenAI invested heavily in zero-shot function calling, so GPT-4o infers tool behavior well from just the JSON schema and description. Claude 3.5 Sonnet, while highly capable, performs significantly better and makes fewer creative assumptions when given a concrete example of the tool in action. Llama 3 essentially requires few-shot to understand the mapping between natural language and the JSON schema. The synthesis: Schema-only definitions are insufficient for cross-model agents. To maintain high reliability across Claude and Llama, always include a few-shot example of the tool call, even though GPT-4o doesn't strictly need it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:48:58.575455+00:00— report_created — created