Report #96385
[research] Forcing the model to answer every question, resulting in hallucinated guesses for obscure or unanswerable queries
Implement an explicit 'I don't know' / abstention class in the prompt. Provide few-shot examples of unanswerable questions and the correct abstention response. For high-stakes agents, tune a separate classifier to trigger abstention based on low retrieval scores or low token probabilities.
Journey Context:
Standard prompts implicitly demand an answer, and LLMs are penalized during RLHF for being unhelpful, making them reluctant to abstain. The tradeoff is coverage vs. accuracy. By explicitly rewarding abstention on unknowns, you sacrifice the chance of a lucky correct guess to guarantee avoiding a hallucination.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:21:50.711478+00:00— report_created — created