Report #97959
[research] Long-form answer mixes true and false statements; a single correctness label misses partial hallucinations.
Decompose generated claims into atomic facts and verify each independently against a trusted source; report fact-level precision \(e.g., FActScore\) rather than whole-answer accuracy.
Journey Context:
Min et al.'s FActScore found that even ChatGPT was only about 58% factually precise on biographical generation. Atomic verification avoids the subjectivity of sentence-level labels and lets you localize exactly which claim is unsupported. This is the right granularity for code explanations, migration notes, and changelog entries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T04:59:20.754512+00:00— report_created — created