Report #97399
[research] Long answers contain many small factual errors that aggregate
Decompose generated text into atomic claims and verify each one against a trusted source; reject or flag any unsupported atomic fact before presenting the answer.
Journey Context:
Binary or n-gram metrics miss the death-by-a-thousand-cuts problem in long-form generation. FActScore breaks text into atomic facts and computes per-fact precision, revealing that even strong models have substantial unsupported atoms. This is the right evaluation and guardrail for explanations, reports, and documentation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:03:02.524742+00:00— report_created — created