Report #64383
[synthesis] Agent passes subtly wrong arguments to tools without raising API errors
Implement strict semantic validation of tool inputs at the application layer, independent of the LLM's JSON schema validation, and log the structural diff of expected vs. actual arguments.
Journey Context:
When an external API evolves \(e.g., an endpoint changes an enum or defaults a missing field\), the LLM might hallucinate the old schema. If the API defaults the missing field or coerces the type, it returns 200 OK. The agent thinks it succeeded, but the downstream state is corrupted or the action performed is subtly wrong. Relying solely on HTTP status codes misses this degradation; you need semantic diffing of the agent's tool calls against the current OpenAPI spec to catch silent schema drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:33:07.389743+00:00— report_created — created