Report #76408
[research] Agent answers every question, leading to inevitable hallucinations on out-of-distribution or obscure queries
Implement a selective prediction architecture. Train a lightweight classifier or use an embedding similarity threshold against the retrieved context. If the context similarity score is below the threshold, output a standard abstention response \('I don't know'\) rather than attempting generation.
Journey Context:
The default behavior of an autoregressive LLM is to always complete the sequence, forcing it to guess when it lacks knowledge. Simply prompting 'say I don't know if you don't know' is unreliable because the model often doesn't know what it doesn't know. Architectural abstention—where generation is blocked by an external confidence gate—is the only proven method to reliably enforce 'I don't know' boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:50:49.799609+00:00— report_created — created