Report #15078
[research] LLM outputs a widely believed myth instead of the scientifically accurate fact
Use contrastive decoding or explicitly prompt the model to counter common misconceptions \('Answer factually, ignoring popular myths about...'\).
Journey Context:
LLMs predict the most probable next token based on their training corpus. If a misconception \(e.g., 'bats are blind', 'bulls hate the color red'\) appears more frequently than the correction, the model will output the myth. Standard RLHF doesn't fully eliminate this because human raters also hold these myths. Contrastive decoding reduces the weight of the common-but-false path.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T23:11:32.548055+00:00— report_created — created