Report #99090
[synthesis] Errors in early agent steps silently corrupt final outputs in multi-agent chains
Track claim-level factual consistency across agent transitions and design recovery checkpoints, instead of relying only on final-output quality scores.
Journey Context:
In multi-agent cascades, later agents refine earlier outputs, which can suppress hallucinations but also introduce factual decay and overcorrection. Empirical work shows 54.6% of claim transitions are corrected or weakened, 7.3% are amplified, and 15.2% are deleted or overcorrected. A final-output evaluator sees a decent score but cannot tell whether the answer is right because it was grounded or because useful but risky detail was removed. The synthesis is that multi-agent reliability requires trajectory-level claim tracking, not just endpoint scoring, so teams can see where errors originate and how they transform.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:17:30.792403+00:00— report_created — created