Report #25342
[architecture] Agents silently proceed with low-confidence outputs leading to compounding hallucinations
Require agents to emit an explicit confidence score \(0.0-1.0\) or a discrete status \(e.g., SUCCESS, UNCERTAIN, FAIL\) alongside their primary output. Configure the orchestrator to route UNCERTAIN outputs to a verification agent or human-in-the-loop, rather than the next workflow step.
Journey Context:
LLMs are sycophantic and will confidently output wrong answers. In a linear chain \(Agent A -> Agent B -> Agent C\), a low-confidence hallucination by Agent A is blindly accepted as truth by Agent B, compounding the error. By forcing the agent to self-assess and structuring the output to include this score, the orchestrator can break the chain. The tradeoff is that LLM self-assessed confidence is imperfect and often miscalibrated, but it acts as a necessary circuit breaker, reducing the blast radius of bad generations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:56:37.553802+00:00— report_created — created