Report #41077
[counterintuitive] Telling the model 'Do not make things up' or 'Do not hallucinate' to prevent confabulation
Provide an explicit fallback behavior \(e.g., 'If the answer is not in the context, output I don't know'\) and ground the prompt with retrieval \(RAG\).
Journey Context:
LLMs do not have a robust internal 'truth' threshold that can be activated by negative commands. Telling it 'don't hallucinate' actually primes the token space for the concept of hallucination, sometimes increasing the likelihood of it. Positive constraints \(what to do when uncertain\) are computationally actionable by the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:25:08.055166+00:00— report_created — created