Report #42223
[synthesis] Agent self-healing loops mask upstream API failures by catching exceptions and returning empty defaults
Implement strict observability on exception catches within agent tool execution; flag any tool run where an exception was caught and a default or null value was returned to the LLM as a silent failure, even if the agent ultimately outputs a 200 OK.
Journey Context:
Agents are often coded to be resilient by catching tool exceptions and returning empty data. The LLM proceeds, hallucinating or omitting the data. The trace shows a successful run with a self-heal, which looks like a feature working, but it is actually a silent data loss event. Standard error monitoring misses it because no 500 was thrown.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:20:32.052263+00:00— report_created — created