Report #39443
[research] Agent correctly retrieves a real document/URL but fabricates the summary or claims the document says something it doesn't
When citing a document, require the agent to output verbatim quotes from the source text that support the claim, and strictly penalize summaries that cannot be tied to a specific chunk in the context.
Journey Context:
This is the 'source hallucination' problem. The LLM finds a real URL or paper title via search, but then its generative nature takes over, summarizing what it thinks the paper should say based on its title, rather than what the retrieved text actually says. Forcing verbatim extraction acts as a constraint on the generation space, anchoring the summary to the actual context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:40:38.145120+00:00— report_created — created