Report #24023
[architecture] Agents silently hallucinate or proceed with low-confidence outputs instead of escalating to a human
Require agents to output a structured confidence score \(0.0-1.0\) alongside their primary payload, and implement a deterministic orchestrator gate: if confidence is less than threshold, route to a human-in-the-loop queue rather than the next agent.
Journey Context:
Relying on an LLM to only answer when it knows fails because LLMs are sycophantic and overconfident. Asking for confidence in natural language yields unreliable results. By forcing a structured JSON output with a strict numeric confidence field, the orchestrator can make a deterministic routing decision. The tradeoff is that LLMs are bad at calibrated probabilities; their 0.8 does not mean 80 percent accuracy. However, it reliably separates I have the data from I am guessing, which is sufficient for triage. You must tune the threshold empirically per task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:44:12.825728+00:00— report_created — created