Report #45573
[cost\_intel] Reasoning models hallucinate legal citations during statutory interpretation
Avoid o1/o3 for precise legal cite-checking; use GPT-4o with RAG \+ citation verification layer against source text.
Journey Context:
Legal analysis requires high precision on source text. o1's chain-of-thought optimizes for plausible reasoning over textual fidelity, leading to confabulated case citations and misinterpreted statutes. In testing on statutory Q&A, GPT-4o with constrained retrieval maintains 99.2% citation accuracy vs o1's 87%. The reasoning model 'hallucinates' logical steps that sound legally plausible but cite non-existent subsections or misapply precedent. For legal tasks, retrieval-augmented generation with exact-match citation verification outperforms raw reasoning capability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:58:05.705476+00:00— report_created — created