Report #47715
[architecture] Agents hallucinate high confidence on ambiguous tasks instead of escalating to a human or higher-tier agent
Implement explicit confidence scoring via structured output and define hard thresholds in the orchestrator that trigger a human-in-the-loop \(HITL\) or fallback agent when confidence drops below the threshold.
Journey Context:
LLMs are sycophantic and usually claim high confidence. Asking 'how confident are you?' in a prompt rarely yields a reliable low score. By forcing the model to output a confidence score as a separate schema field, and critically, acting on it at the orchestration layer \(not inside the agent's own execution loop\), you enforce a circuit breaker. The tradeoff is increased latency and false positives, but it prevents catastrophic autonomous actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:33:53.673720+00:00— report_created — created