Report #41583
[architecture] Agents silently hallucinate low-confidence answers instead of escalating to humans
Require agents to output an explicit confidence score \(0.0-1.0\) alongside their answer, and define a hard threshold in the orchestrator that routes to a Human-In-The-Loop \(HITL\) queue if below the threshold.
Journey Context:
Agents are sycophantic and will confidently output wrong answers. Asking 'are you sure?' in a loop doesn't work. By forcing a structured output with a confidence field, you externalize the agent's internal uncertainty. The orchestrator acts as a circuit breaker: if confidence < 0.8, pause the pipeline and push to a HITL queue. The tradeoff is that LLM confidence scores are often poorly calibrated \(they are overconfident\), so the threshold must be empirically tuned per-task, and you should combine it with verification \(e.g., self-consistency checks\) rather than relying on it blindly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:16:12.307330+00:00— report_created — created