Report #2725
[research] When should an LLM answer versus say 'I don't know'?
Train or prompt models to abstain when uncertain and optimize for F1 over correctness and abstention, not raw accuracy; reward 'not attempted' on questions outside the model's reliable knowledge boundary.
Journey Context:
SimpleQA provides 4,326 short fact-seeking questions with a single verifiable answer, grading each response as correct, incorrect, or not attempted. Frontier models score below 50%, and ideal behavior requires both high correctness and calibrated abstention. A common mistake is optimizing accuracy alone, which incentives hallucination on hard questions. The right target is F1 that balances answered-correctly against avoided-wrongly; this is the operational definition of 'knowing what you know'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:39:51.255935+00:00— report_created — created