Report #42628
[architecture] Agents hallucinate high confidence on wrong answers, breaking automated human-in-the-loop escalation triggers
Do not rely on the LLM's self-reported numerical confidence score. Derive confidence from structural determinism \(e.g., schema validation\) and semantic consistency \(e.g., ensemble voting\). Trigger HITL based on business-rule thresholds rather than LLM self-assessment.
Journey Context:
LLMs are poorly calibrated for self-evaluation. Asking 'rate your confidence 1-10' yields garbage data. A 95% confidence from an LLM means nothing. Real confidence comes from external verification \(e.g., code compiles, test passes, critic agent agrees\) or deterministic guardrails tied to business logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:01:18.000704+00:00— report_created — created