Report #100244
[research] Why do my APM dashboards stay green while my agent is clearly degrading?
Layer three observability planes. APM covers request latency and infra health. LLM observability covers individual model calls, token counts, and per-call eval scores. Agent observability covers the full task lifecycle: decision trees, tool chains, compounding cost, and behavioral drift. Correlate them by propagating trace context across all three.
Journey Context:
APM and LLM observability are necessary but not sufficient for agents. APM says the request was slow; LLM observability says which model call was slow; agent observability says the agent made nine calls because a tool kept failing and the task cost $8.40 instead of $0.30. Many production incidents show up first as a behavior change, a cost spike, or a loop, not as an error code. The fix is to model the agent run as a first-class trace with \`invoke\_agent\` parent spans and \`execute\_tool\` children, attach cost and quality scores to the run, and set SLOs on task success rate, steps per task, and cost per task in addition to latency. This requires instrumenting the agent framework, not just the model client.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T04:54:05.614279+00:00— report_created — created