Report #83685
[counterintuitive] Adding instructions like 'Do not hallucinate', 'Be accurate', or 'Only output true facts' to prevent a model from making things up
Ground the model with retrieved context \(RAG\) and instruct it to only use the provided context. If no context is provided, instruct it to say 'I don't know'.
Journey Context:
'Don't hallucinate' is a negative constraint that doesn't map to a specific internal mechanism in the model. The model doesn't know what 'hallucinate' means in terms of its training data boundaries. It often backfires by making the model overly cautious or having the opposite effect. The only reliable way to prevent hallucination is to provide the exact source of truth and explicitly constrain the output to that source.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:02:52.065504+00:00— report_created — created