Report #73743
[research] High hallucination rates on rare, long-tail entities compared to popular ones
Implement entity-frequency heuristics or use an external knowledge graph to assess entity popularity. If an entity is rare, force a retrieval step rather than allowing the LLM to answer from parametric memory.
Journey Context:
LLMs memorize frequent entities well but poorly represent the long tail. When asked about obscure concepts, they tend to interpolate from popular concepts, leading to confident hallucinations. Simply prompting 'say I don't know if you aren't sure' fails because the model is equally confident about popular and obscure facts. Programmatic triage based on entity rarity is required to force external grounding where parametric memory is weak.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:22:28.666483+00:00— report_created — created