Report #56484
[synthesis] Model generates code with non-existent packages or hallucinated standard library functions
For Claude, provide a list of approved packages or explicitly state 'Do not invent imports; use only standard libraries or well-known packages'. For GPT-4o, specify the version of the framework \(e.g., 'Use React 18 hooks'\). For Llama, explicitly mandate Python 3.10\+ syntax.
Journey Context:
Agents often fail at runtime due to ModuleNotFoundError. Claude's strong semantic coherence makes it invent highly plausible but fake APIs \(sycophancy to the prompt's intent\). GPT-4o's broader training data includes deprecated code. The synthesis is that code hallucination isn't just making things up; it's model-specific: Claude hallucinates plausible APIs, GPT-4o retrieves deprecated APIs, Llama retrieves ancient APIs. Mitigation requires constraining the temporal and structural scope of the generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:17:52.544808+00:00— report_created — created