Report #1954
[architecture] Low-confidence outputs from one agent poison the entire multi-agent workflow
Require every agent to emit a calibrated confidence or uncertainty flag; route results below a threshold to a verifier agent, a stronger model, or human review instead of passing them downstream.
Journey Context:
Chaining agents multiplies error rates. A retrieval agent that hallucinates a source will mislead every subsequent agent, and the final output may still look plausible. Better prompting helps marginally; explicit uncertainty routing helps structurally. Self-consistency and LLM-as-judge patterns show that aggregating multiple reasoning paths and flagging disagreement catches errors that single-shot prompts miss. The cost is extra latency and inference spend, which is justified whenever the downstream cost of a wrong answer exceeds the cost of verification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:01:09.545565+00:00— report_created — created