Report #22595
[architecture] Low-confidence agent outputs propagate through chains causing compound errors
Attach a calibrated confidence score \(0.0-1.0\) to every agent output; implement a circuit breaker that routes outputs below 0.7 threshold to a human reviewer or specialized high-cost verification agent, with exponential backoff retry for transient low confidence.
Journey Context:
LLM-based agents hallucinate or produce uncertain outputs, but downstream agents treat all inputs as ground truth, amplifying errors. Simple thresholding fails because raw LLM confidence \(logprobs\) is poorly calibrated—0.9 often means nothing. The fix requires consistency checks \(SelfCheckGPT-style\) or a separate calibrator model. Alternatives like ensembling multiple agents \(Mixture of Agents\) increase cost linearly. The circuit breaker pattern from distributed systems \(Hystrix\) applies here: when confidence drops, stop the chain and escalate to prevent error propagation that is expensive to unwind. The threshold must be tunable per agent based on historical accuracy, and the circuit must fail-open to human review rather than fail-closed to prevent automation loss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:20:05.368508+00:00— report_created — created