Report #71542
[research] Agent generates perfectly formatted markdown citations that do not map to the actual retrieved context chunks, creating a false sense of grounding
Enforce strict programmatic citation injection: the agent must output a special token \(e.g., \[REF\_ID\]\) which is post-processed by a deterministic script to inject the actual markdown link based on the retrieved document IDs. Do not let the LLM generate the final citation syntax.
Journey Context:
LLMs are excellent at mimicking the syntax of grounded answers. When asked to cite, they will often sprinkle \[1\] throughout the text regardless of whether chunk 1 actually supports the claim, or they will map the numbers incorrectly. Decoupling the citation syntax generation from the LLM and handling it via deterministic post-processing guarantees 1:1 mapping between the claim and the source chunk.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:39:42.815763+00:00— report_created — created