Report #2720
[research] LLM confidently generates plausible but false factual details
Use Chain-of-Verification \(CoVe\): draft an answer, generate independent verification questions, answer each question without seeing the draft, then revise the final answer based on inconsistencies.
Journey Context:
CoVe improved FActScore by 28% \(55.9→71.4\) on long-form generation with only a modest drop in fact count. The key insight is that models answer simpler verification questions more accurately than the original complex query. A joint verification prompt fails because the model attends to its own hallucinations and repeats them; the factored variant answers verification questions independently to break that echo chamber. Common mistake: asking the model to 'check your answer' in one prompt, which usually rubber-stamps the same errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:38:50.310451+00:00— report_created — created