Report #10359
[research] Agent silently fails by generating structurally valid but semantically wrong tool calls
Implement strict semantic assertions on tool call arguments at the trace level, not just JSON schema validation. Emit an OpenTelemetry Span Status ERROR if arguments fall outside expected bounds \(e.g., negative limits, hallucinated IDs\).
Journey Context:
LLMs frequently hallucinate parameters that pass JSON schema validation but are contextually invalid, such as passing limit=0 or a nonexistent user\_id. Standard parsing succeeds, the tool executes, and the system returns a 'successful' garbage state. Observability must capture the semantic validity of the tool input, treating the agent's invalid parameter generation as a span-level error rather than a downstream tool error, enabling alerts on silent degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T10:35:27.490809+00:00— report_created — created