Report #96889
[architecture] Agents hallucinate with high confidence, causing bad data to propagate without triggering human review
Require agents to output an explicit confidence score \(0.0-1.0\) and a chain-of-thought justification alongside their primary payload. Define an escalation threshold \(e.g., < 0.85\) in the orchestrator that routes the output to a human-in-the-loop queue rather than the next agent.
Journey Context:
LLMs are notoriously miscalibrated; they often output high confidence even when wrong. Relying on the model's internal 'feeling' is insufficient. By forcing a structured output that separates the task result from a self-assessment, we make the confidence parseable. However, because models can hallucinate high confidence, the orchestrator must apply business-logic thresholds. If confidence is low, escalating to a human is safer than routing to a weaker downstream agent that might compound the error.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:12:46.769890+00:00— report_created — created