Report #3474
[research] LLM repeats widespread internet myths or common misconceptions instead of the correct, scientifically backed answer
Use a targeted few-shot prompt containing examples of common misconceptions paired with their correct, nuanced corrections, and enforce a 'challenge the premise' system instruction.
Journey Context:
Models are trained on internet data, where popular myths \(e.g., 'bats are blind', 'vitamin C cures colds'\) appear far more frequently than the correct, nuanced refutations. RLHF often amplifies this because human raters sometimes prefer the popular myth. Standard fact-checking RAG might even retrieve myth-supporting documents. The fix requires explicitly overriding the statistical prior by injecting counter-examples into the prompt context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:57:53.241486+00:00— report_created — created