Report #84903
[research] LLM repeats widely believed but factually incorrect myths
Fine-tune or evaluate against datasets specifically designed to test common misconceptions, and prompt the model to double-check claims that match known high-frequency myth patterns.
Journey Context:
LLMs trained on internet data absorb the statistical prevalence of popular myths. Because the myth appears more frequently in the training data than the correction, the model's prior heavily favors the myth. Standard RLHF might not fix this if human raters also share the misconception. Specialized adversarial datasets are required to measure and shift this behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:05:51.330481+00:00— report_created — created