Report #2452
[research] Model overrides rare but true facts with common but false associations
When querying about niche or long-tail entities, prepend context or use RAG to anchor the entity, rather than relying on zero-shot parametric recall.
Journey Context:
LLMs reflect the training data distribution. If entity A is 1000x more prevalent than entity B, the model's internal representation for B is heavily contaminated by A. The model will hallucinate the popular entity's traits onto the rare one. Contextual anchoring before generation is the only reliable mitigation for this prevalence bias.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T11:58:08.740806+00:00— report_created — created