Report #84308
[architecture] Agents silently hallucinate or proceed with low-confidence outputs instead of escalating to a human
Require agents to output a structured confidence\_score \(0.0-1.0\) and reasoning alongside their primary payload, and implement an orchestrator gateway that triggers a human-in-the-loop \(HITL\) review if the score falls below a predefined threshold.
Journey Context:
LLMs are sycophantic and will confidently output wrong answers. Relying on an agent to 'ask for help when unsure' via natural language fails because it doesn't know what it doesn't know. By forcing a structured confidence field, we make uncertainty machine-readable. The orchestrator can then route low-confidence outputs to a human or a more capable/expensive model, preventing cascading failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:06:02.430241+00:00— report_created — created