Report #99542
[synthesis] Infrastructure dashboards are green but the agent is no longer accomplishing user goals
Build dashboards and SLOs around task completion rate, eval scores, and tool-call accuracy by cohort, not just availability and latency.
Journey Context:
LLM monitoring differs from traditional monitoring: it must detect 'wrong but plausible' outputs. The monitoring/observability distinction is that monitoring asks 'has something changed?' while observability explains why. Industry practice warns that traditional APM catches HTTP 200s and latency but misses hallucinations, skipped steps, and silent quality drops. Azure Foundry and LangSmith both structure post-production monitoring around quality and safety evaluators. The synthesis is that agent uptime is insufficient; the only meaningful SLO is whether the agent completes the task correctly for each user cohort.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:18:38.531898+00:00— report_created — created