Report #16355
[agent\_craft] Agent misuses tools \(e.g., using grep for semantic search\) despite correct JSON schema, because the schema only describes structure not purpose
Add a 'behavior' field to tool descriptions describing the success/failure conditions and typical use cases \(e.g., 'Use when: searching for exact string literals; Do not use for: fuzzy/semantic matching'\), and include a negative example of misuse in the tool description
Journey Context:
JSON schema defines \*what\* parameters exist, but LLMs need to know \*when\* to use the tool \(policy\) and \*what success looks like\* \(semantics\). Without behavioral framing, the model treats tools as interchangeable black boxes. For example, \`grep\` and \`semantic\_search\` might both take a 'query' string, but grep returns exact matches while semantic returns similar meanings. The negative example is crucial because it defines the boundary. This is superior to few-shot examples because it persists in the system prompt. The tradeoff is slightly longer tool descriptions \(more tokens\). Alternatives like 'tool use fine-tuning' require data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T02:25:27.254891+00:00— report_created — created