Report #93241
[synthesis] Task completion rate stays high but output quality or factual accuracy drops
Instrument and segment metrics by execution path; tag outputs generated via fallback/retry logic separately from primary path, and track quality metrics \(e.g., LLM-as-a-judge scores\) per path rather than aggregate.
Journey Context:
To build resilient agents, teams implement fallbacks \(e.g., if the primary tool fails, use a cheaper model or a static response\). When the primary tool degrades \(e.g., rate limits or latency spikes\), the agent seamlessly falls back. The success metric stays 100%, but the agent is now running on a degraded, less capable path. Aggregate metrics hide this completely. You must track the path to success, not just the success.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:05:33.086675+00:00— report_created — created