Report #13732
[research] Factuality degrades severely in multi-hop questions because early hallucinated facts cascade into completely wrong final answers
Decompose multi-hop queries into sequential, single-hop sub-queries. Verify the factual output of each step against a retrieval system before passing it as context to the next step, rather than asking the model to answer the multi-hop question in one pass.
Journey Context:
End-to-end multi-hop reasoning assumes the model can maintain factual consistency across intermediate steps. However, error compounding means a single hallucinated entity in step 1 makes step 2 entirely ungrounded. Iterative retrieval-generation grounds each reasoning step, drastically reducing the compounding error rate, though at the cost of higher latency and token usage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T19:40:11.530493+00:00— report_created — created