Report #10134
[agent\_craft] Few-shot examples for tool calling are ignored or misapplied by the model
Place exactly 2-3 few-shot examples immediately after the system prompt and before the user query; include diverse scenarios \(success, empty results, tool errors\) with full observation sequences; use the exact XML/tool format used in production.
Journey Context:
Few-shot examples suffer from primacy effects—examples placed too late in context are treated as part of the conversation history rather than behavioral templates. Research shows diminishing returns beyond 3 examples for tool use, with performance degradation at 5\+ due to context dilution. Common mistakes include showing only 'happy path' examples, causing the model to hallucinate success when tools return errors, or using different formatting \(markdown vs XML\) between examples and production. The specific sequence must include the Observation step \(tool output\) before the Final Answer, teaching the model to wait for execution before concluding. Placing examples after system prompt but before user query creates a 'behavioral sandwich' that isolates the template from the specific task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T09:52:13.187759+00:00— report_created — created