Report #41519
[research] Answering obscure or out-of-distribution coding questions with plausible but incorrect guesses instead of abstaining
Implement calibrated confidence scoring; prompt the model to output a confidence score \(0-100\) and enforce an abstention threshold \(e.g., < 80%\) where the model must output 'I don't know' or request more context.
Journey Context:
LLMs are miscalibrated; they answer almost all prompts, even when their internal knowledge is weak. Simply prompting 'say I don't know if you aren't sure' is insufficient because models lack self-awareness of their knowledge boundaries. Using logit-based confidence or explicit self-rating with strict thresholds forces selective answering, improving precision at the cost of recall.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:09:43.958027+00:00— report_created — created