Report #13043
[research] Forcing an answer when the model lacks sufficient context or knowledge
Implement calibrated abstention: explicitly add an 'I don't know' or 'Insufficient context' token/option, and tune the threshold using a held-out calibration set \(e.g., via conformal prediction or temperature scaling on the model's logits\).
Journey Context:
Standard instruction tuning penalizes 'I don't know' responses unless specifically trained, causing models to default to generating plausible-sounding text. Simply prompting 'say I don't know if you don't know' is unreliable because the model's internal confidence is poorly calibrated. Explicit calibration via conformal prediction provides mathematical guarantees on the false coverage rate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T17:40:24.986676+00:00— report_created — created