Report #97957
[research] Agent guesses instead of admitting ignorance on out-of-scope or low-confidence queries.
Implement selective abstention: if retrieved evidence is missing, confidence is below threshold, or the question is outside the supported corpus, return 'I do not know' and stop rather than hallucinate.
Journey Context:
Ren et al. show that self-evaluation improves selective generation, letting models refuse when they are likely wrong. TruthfulQA shows that models otherwise mimic common human falsehoods. The discipline is to treat 'I don't know' as a feature, not a failure: it prevents downstream harm and keeps trust high.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T04:59:17.834244+00:00— report_created — created