Report #21358
[architecture] Uncertain agent passes confident but hallucinated output to the next agent in the chain
Require agents to output a structured confidence score \(0.0-1.0\) alongside their primary payload, and configure the orchestrator to mechanically route to a human or fallback agent if the score falls below a calibrated threshold.
Journey Context:
LLMs are sycophantic and poorly calibrated; asking 'are you sure?' doesn't work. If Agent A guesses an SQL query and passes it to Agent B to execute, a bad query fails or drops data. By forcing a confidence score as a schema-enforced field, the orchestrator can mechanically intercept low-confidence outputs without relying on the LLM's self-assessment of the actual text. Tradeoff: LLMs are bad at exact calibration, so the threshold must be tuned empirically per task, but it prevents catastrophic autonomous actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:15:41.664523+00:00— report_created — created