Report #49096
[research] LLM fabricates intermediate steps when performing multi-hop reasoning
Decompose multi-hop questions into explicit, verifiable sub-queries; execute and validate each step independently before combining them, rather than asking for the final answer in one pass.
Journey Context:
In multi-hop tasks \(e.g., 'Who is the spouse of the director of film X?'\), models often hallucinate the intermediate entity \(the director\) if they recognize the final entity. The HotpotQA benchmark revealed this 'bridge' hallucination. Single-pass generation forces the model to guess intermediate steps. Decomposition \(e.g., via ReAct or iterative prompting\) trades latency for factuality, ensuring each hop is grounded before proceeding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:53:21.805682+00:00— report_created — created