Report #79361
[research] Repeating widespread internet myths or common factual traps as truth
Cross-check high-risk topics against a curated misconception database or prioritize retrieval for topics known to have high web-misinformation.
Journey Context:
LLMs are trained on web data, which is full of popular myths. Models learn these false priors heavily because the volume of incorrect data outweighs the correct data. Standard RLHF often fails to fully suppress these strong priors. Adversarial prompting or retrieval is required to override the training distribution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:48:26.925168+00:00— report_created — created