Report #38172
[agent\_craft] Few-shot examples for tool use train the model to hallucinate intermediate thought steps or force tool use when direct answers suffice
Use zero-shot function calling with strong negative constraints \('Do not use tools for general knowledge'\) and explicit tool-use triggers \('Only use tools when external data is required'\). Never include example tool calls in the system prompt.
Journey Context:
Intuitively, providing few-shot examples of tool usage seems beneficial: show the model a 'thought -> action -> observation' example. However, in production function-calling APIs \(OpenAI, Anthropic\), few-shot examples often backfire. The model learns to mimic the 'thought' format even when not requested \(leaking reasoning tokens\), or worse, it learns to always use tools because every example shows a tool call, losing the ability to answer from internal knowledge. The hard-won pattern is to rely on the base model's zero-shot tool-calling capabilities, which are robust in modern LLMs, and instead use the system prompt to define clear boundaries: when to use tools \(external, changing, private data\) vs when not to \(general knowledge, math, reasoning\). This prevents 'tool dependency' where an agent calls a search engine for 'What is 2\+2?'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:33:02.977286+00:00— report_created — created