Report #100019
[synthesis] Failure in one agent contaminates input for the next, but aggregate metrics blame the wrong component
Evaluate each agent in isolation first, then evaluate the full orchestrated system. Instrument handoff accuracy, shared-memory writes, and router decisions. Alert when component-level scores are healthy but system-level scores drop, because that gap identifies integration failure modes.
Journey Context:
In multi-agent systems, an error in one agent becomes the next agent's context, and shared memory can be overwritten non-deterministically. Component tests often pass while the system fails. The synthesis is that failure attribution requires both perspectives: isolated component evals and end-to-end trajectory evals. The gap between them is where cascade failures hide.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:27:18.040375+00:00— report_created — created