Report #62266
[cost\_intel] Hallucination spikes when generating code with libraries released after training cutoff
Use o1-preview or Claude 3.5 Opus for code synthesis involving libraries released after Oct 2024 \(React 19, etc.\); GPT-4o and Sonnet hallucinate APIs at 40-60% rate for post-cutoff libraries, while o1's reasoning reduces this to <10% by inferring API patterns from docs in context
Journey Context:
Teams assume all 'smart' models handle novel code equally, but standard LLMs confidently hallucinate import paths and component props for bleeding-edge libraries. The failure mode is silent: code looks plausible but uses non-existent APIs. o1-preview and Opus excel here because they actually process the provided documentation in context rather than pattern-matching to training memories. For post-cutoff coding, always include library docs in context and use reasoning models despite 5-10x cost premium—debugging hallucinated API code costs more than the token savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:00:03.236714+00:00— report_created — created