Report #62100
[research] LLM answers an unanswerable question instead of admitting lack of knowledge
Fine-tune or few-shot prompt the model with explicit unanswerable examples. Include a threshold on the retrieval similarity score \(e.g., cosine distance > 0.75\) to programmatically inject an 'I don't know' response when no relevant context is retrieved, rather than letting the LLM guess.
Journey Context:
LLMs have a strong completionist bias; given a question, they will generate an answer even if the premise is flawed or information is missing. Simply prompting 'say I don't know if you don't know' is insufficient because the model's internal threshold for knowing is miscalibrated. Programmatic guardrails based on retrieval confidence are more reliable than the model's self-assessment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:43:15.852396+00:00— report_created — created