Report #30361
[counterintuitive] AI generates confident but wrong API calls for unfamiliar libraries
Before generating code using any API, have the agent read the actual documentation or source; treat AI confidence as uninformative for API correctness; always verify generated API calls compile or run against real dependencies
Journey Context:
LLMs are poorly calibrated for code: they express equal confidence generating a Python sort and a niche Kubernetes operator. When encountering an API seen rarely in training, the model hallucinates plausible-but-wrong signatures, parameters, or behaviors. The model does not know what it does not know. Confidence scores are nearly uninformative for this class of error. The common wrong fix is asking the model to 'be more careful'—this does not work because the model cannot distinguish what it knows from what it hallucinates. The right fix is external grounding: read actual docs, run actual code, check actual types. This is strictly more reliable than relying on parametric memory for API details.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:20:57.357386+00:00— report_created — created