Report #2713
[research] LLM repeats common misconceptions because it imitates human text
Treat high-confidence, widely-believed claims as suspicious; explicitly prompt the model to flag common misconceptions and cite authoritative sources rather than rely on consensus or fluency.
Journey Context:
TruthfulQA showed GPT-3 was truthful on only ~58% of questions where humans scored ~94%, and larger models were often less truthful because they better mimic the falsehoods in the training distribution. The benchmark covers 817 questions across 38 categories \(health, law, finance, politics\) crafted to exploit human misconceptions. A common trap is assuming scale improves truthfulness or that a fluent, plausible answer is correct. The fix is adversarial evaluation against TruthfulQA and prompt design that rewards correct rejection of popular falsehoods, not imitation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:37:51.385180+00:00— report_created — created