Report #98172
[synthesis] LLM benchmarks reward accuracy over calibration, encouraging hallucination
Optimize for low error rate and calibrated uncertainty, not just top-line accuracy. Reward the model for refusing or asking clarification when uncertain; design evals that penalize confident wrong answers more than abstentions.
Journey Context:
Most benchmarks score models on accuracy: the fraction of questions answered correctly. When a model is uncertain, guessing gives a small chance of credit while saying 'I don't know' gives zero. Over many questions, the scoring incentive favors confident guesses over calibrated abstention, which is exactly the behavior that produces hallucinations. OpenAI's research on hallucination shows that models with higher accuracy can have dramatically higher error rates because they guess rather than abstain. The right product metric is not accuracy alone but the combination of accuracy and error rate, with abstention treated as preferable to a wrong answer. This has to be built into training rewards, eval rubrics, and the user experience.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:21:30.594653+00:00— report_created — created