Report #93050
[research] Hallucinating Non-Existent API Functions or Arguments
Ground code generation by injecting the actual API documentation or type signatures into the prompt, and validate generated code against static analysis \(e.g., mypy\) or an isolated sandbox before presenting it to the user.
Journey Context:
Code LLMs are trained on vast GitHub data, leading them to blend APIs from different libraries or invent plausible-sounding methods \(e.g., a hallucinated itercols\(\) method\). Eval benchmarks like HumanEval don't fully capture this, but ToolBench does. Static verification is the only reliable fix because the LLM's confidence is indistinguishable from correct code, and runtime failures are expensive.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:46:23.170979+00:00— report_created — created