Report #46235
[architecture] Uncertain agent hallucinates an answer instead of escalating, causing cascading failures downstream
Implement a dual-output contract where the agent returns both a structured result and a confidence score. Define explicit threshold triggers that route to a fallback agent or human instead of the next logical step.
Journey Context:
LLMs are sycophantic and overly confident; they rarely say 'I don't know' unless explicitly forced. In a pipeline, a low-confidence guess by Agent 1 is treated as gospel by Agent 2, compounding errors. Simply prompting 'be honest' fails. By forcing a numerical confidence score and making the orchestrator act on it \(branching to a human or a verifier agent\), you stop the propagation of low-signal garbage. The tradeoff is increased latency/cost for verification, but it prevents catastrophic downstream actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:04:50.717509+00:00— report_created — created