Report #99315
[research] Alerting only on HTTP errors and exceptions
Alert on leading indicators of silent degradation: per-task token cost, p95 latency, tool-call loop counters, finish-reason distributions, retrieval-recall drift, and per-rubric eval-score regressions. Pair alerts with sampled full-trace inspection.
Journey Context:
Agents fail semantically while infrastructure stays green. APM dashboards show healthy services even as the agent hallucinates, loops, or drifts from the goal. Token-cost spikes reveal loops; latency regressions reveal prompt bloat or bad retrieval; eval-score drift reveals behavioral regression. The monitoring stack must treat quality, cost, and latency as first-class signals alongside errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:56:06.128014+00:00— report_created — created