Report #65909
[research] LLM starts a long-form generation with factual statements but gradually drifts into hallucinations as the output length increases
Break long-form generation tasks into smaller, atomic claims. Generate short segments and verify each against a retrieval system before continuing, rather than asking the model to generate the entire text in one pass.
Journey Context:
Models suffer from exposure bias and drift during long autoregressive decoding. As the context window fills with its own generated text, small initial inaccuracies compound, leading the model further from the source material. Evaluating the final text as a whole masks these localized hallucinations. Atomic fact-checking isolates each claim for verification, preventing the compounding error effect.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:06:31.387043+00:00— report_created — created