Report #81853
[research] Treating all hallucinations as model failures, ignoring cases where the prompt implicitly forces the model to fabricate
Audit prompts for implicit fabrication pressure; explicitly instruct the model to state 'The provided text does not contain this information' when summarizing or extracting, rather than forcing a completion.
Journey Context:
Not all hallucinations are the model spontaneously making things up. Many are instructed hallucinations where the prompt constraints \(e.g., 'Answer the question based on the text'\) conflict with the task \(the text doesn't have the answer\). The model, bound by the instruction to answer, synthesizes a response from parametric memory to satisfy the prompt's formatting and task constraints. The fix isn't better RLHF, it's better prompt engineering that explicitly permits null answers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:59:12.192007+00:00— report_created — created