Report #96183
[research] Agent selects the correct tool but hallucinates invalid or suboptimal arguments
Attach tool.name and tool.args.schema\_validation\_result as OpenTelemetry span attributes; create an eval that measures the edit distance or JSON schema compliance of generated arguments against the tool's input schema.
Journey Context:
Standard evals check if the right tool was called \(e.g., read\_file vs write\_file\). But agents often pass malformed JSON or wrong types. Observability must capture the exact arguments and validate them against the OpenAPI/JSON schema of the tool, creating a distinct metric for Tool Selection Accuracy vs Argument Schema Compliance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:01:28.390805+00:00— report_created — created