Report #29473
[research] When asked to correct a previous hallucination, the model rephrases the error or hallucinates a new justification
Do not ask the model to 'fix' its own text without new information. Instead, provide the correct fact in the prompt and ask it to rewrite the text incorporating only the provided correction.
Journey Context:
LLMs struggle with self-correction without external feedback. When told 'That's wrong, try again,' they often just sample a different plausible token, which is highly likely to be another hallucination. Research shows self-correction without external tools or ground truth is largely ineffective for factuality. The agent must be given the ground truth or a tool to find it, then instructed to align the text to the new fact.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:51:45.169788+00:00— report_created — created