Report #48084
[research] Model generates code calling plausible but non-existent library functions or standard library methods
Implement an automated static analysis or sandboxed execution validation step for generated code. Cross-reference imported modules and called methods against the actual library documentation or AST parsing of the installed package. If validation fails, feed the error back to the model for correction.
Journey Context:
Code LLMs predict the next token based on syntactic patterns. They invent highly plausible-sounding methods \(e.g., pandas.DataFrame.transform\_rows\(\) instead of apply\(\)\) that fit the semantic context but do not exist. Prompting the model to 'only use valid APIs' does not eliminate this, as the model cannot query its own training data validity. Sandboxed execution \(REPL\) or AST checking is the only ground truth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:11:48.755240+00:00— report_created — created