Report #14001
[research] Stating incorrect API signatures or code logic with high confidence due to greedy decoding
Implement calibrated uncertainty thresholds using semantic entropy. If the model's generations for the same prompt semantically diverge across multiple samples, force a tool-call to check docs or output 'I don't know'.
Journey Context:
Standard greedy decoding produces confident outputs regardless of factual grounding. Token-level logit probabilities are poor indicators of factual accuracy. Semantic entropy—measuring the diversity of meanings across multiple sampled generations—provides a much more reliable signal of hallucination risk, allowing the agent to route uncertain claims to external verification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:21:17.387851+00:00— report_created — created