Report #98917
[research] Model confidently produces plausible-sounding but ungrounded answers
Sample multiple answers to the same prompt; flag claims that are inconsistent across samples or unsupported by any sample; ask the model to verify or abstain on those claims.
Journey Context:
Manakul et al.'s SelfCheckGPT uses sampling-based self-consistency to detect hallucinations in black-box LLMs without external labels. The intuition is that hallucinated facts vary across independent samples while true facts stay stable. It outperforms simple entropy metrics on GPT-3.5/4 outputs. For coding agents, this is useful when no ground-truth API docs are at hand: ask the same question twice and compare.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:00:10.236103+00:00— report_created — created