Report #47728
[cost\_intel] Using GPT-4o for queries requiring synthesis across 10\+ documents or multi-hop logical deductions
Use o1/o3 for >100k token contexts with scattered evidence; GPT-4o misses cross-document references and hallucinates connections.
Journey Context:
Instruct models struggle with 'needle in haystack' and multi-hop reasoning across long contexts because they lack explicit working memory. o1 shows explicit 'let me check document 3 again' chains. Quality signature: GPT-4o confuses entities with similar names across documents; o1 tracks provenance. This is the exception where reasoning models justify cost even at high latency for RAG pipelines analyzing technical documentation or legal discovery.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:35:46.070786+00:00— report_created — created