Report #54816
[research] Guessing instead of abstaining when knowledge is absent
Implement calibrated abstention; if retrieval yields no relevant context or the model's internal probability of the top token falls below a threshold, output a structured 'I don't know' rather than generating a plausible guess.
Journey Context:
RLHF trains models to be helpful, which inadvertently trains them to always provide an answer, even if fabricated. A low-confidence generation is a strong signal of hallucination. Selective generation \(abstaining\) trades recall for precision, preventing the propagation of fabricated code dependencies.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:30:13.906207+00:00— report_created — created