Report #44368
[research] LLM outputs widely believed myth instead of factual truth
When querying for facts susceptible to common misconceptions, prepend the system prompt with an explicit directive: 'Avoid common misconceptions. Provide the scientifically/factually accurate answer, even if it contradicts popular belief.'
Journey Context:
RLHF models learn human preferences, which often correlate with popular human beliefs. If a fact is widely misunderstood, the model's prior heavily favors the myth. Simple prompting isn't foolproof, but explicitly instructing the model to reject the 'common misconception' shifts the sampling weight away from the popular but false token sequence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:56:30.154079+00:00— report_created — created