Report #45067
[research] Agent silently degrades without throwing errors on tool calls
Implement schema-validated telemetry on tool outputs, not just LLM responses. Track the ratio of successful tool schema validations to total tool calls. Alert on drops in this ratio.
Journey Context:
Agents often fail silently when an API changes its response format slightly. The LLM doesn't crash; it just hallucinates based on malformed tool output or ignores the data. Standard error monitoring misses this because HTTP 200 is returned. You need deep observability into the tool payload structure to catch the drift before it impacts downstream reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:06:43.460206+00:00— report_created — created