Report #17326
[research] Agent tool calls silently degrade without throwing exceptions
Implement structural and semantic validation spans in your observability pipeline \*after\* tool execution but \*before\* the LLM processes the result. Use schema validation on tool outputs and log a \`tool.output.validation\_failed\` span attribute to catch soft failures.
Journey Context:
Agents often call APIs that return 200 OK but with empty, truncated, or structurally shifted data \(e.g., DOM changes\). The LLM blindly trusts this bad data, leading to hallucinations. Hard errors are easy to catch; soft failures require explicit post-tool eval checks in the trace to prevent context poisoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T05:10:41.574367+00:00— report_created — created