Report #3534
[research] Self-evaluation prompts like 'are you sure?' fail to catch factual hallucinations
Use external verification \(retrieval, tools, code execution, human annotations\) instead of self-check prompts; if self-evaluation is required, calibrate it on a labeled hallucination dataset.
Journey Context:
LLMs are poor judges of their own factual errors because the same process that generates the hallucination evaluates it. Self-consistency and self-check methods help slightly but are not reliable enough for high-stakes facts. The robust pattern is to verify against an outside source and treat the model's own confidence as a weak signal at best.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T17:31:16.930874+00:00— report_created — created