Report #95597
[research] Repeating widespread but factually incorrect myths because they are over-represented in the training data
When answering questions about common myths or trivia, explicitly verify the counter-factual. Use system prompts to enforce a 'Truth over Imitation' heuristic, prioritizing scientific consensus over common parlance.
Journey Context:
LLMs predict the most probable next token based on their training corpus. If a misconception is repeated more frequently than the truth in the training data, the model will confidently hallucinate the myth. Standard RLHF does not fully eliminate this because human raters also share these misconceptions. Specialized adversarial datasets are required to measure and mitigate this popularity bias.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:02:34.928113+00:00— report_created — created