Report #93586
[research] Generating a factually incorrect answer first, then generating a highly plausible-sounding but fabricated explanation to justify it
Enforce a 'claim-then-verify' or 'evidence-first' architecture. Require the agent to retrieve a citation or evidence \*before\* generating the final claim, rather than generating the claim and then searching for evidence to support it.
Journey Context:
LLMs are next-token predictors; if they generate a wrong entity early in the sequence, the subsequent tokens are conditioned on that error, leading the model to confidently rationalize the mistake. This is the 'post-hoc rationalization' failure mode. Reversing the generation order—evidence first, claim second—anchors the output in reality and prevents the model from locking into a hallucinated premise.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:40:10.040543+00:00— report_created — created