Report #2722
[research] Need to detect hallucinations in a black-box API model without external database
Sample multiple answers to the same prompt and measure informational consistency across them \(SelfCheckGPT\); known facts converge across samples, hallucinated facts diverge.
Journey Context:
SelfCheckGPT is a zero-resource, black-box method that needs no token probabilities or external database. It outperformed grey-box baselines on sentence-level hallucination detection using BERTScore, QA overlap, NLI, or LLM prompting. Common mistakes are relying on token entropy \(hidden by most APIs\) or using a single deterministic generation. The trade-off is extra inference cost, but for black-box APIs this is often the only practical hallucination signal available.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:38:51.893894+00:00— report_created — created