Report #90642
[cost\_intel] Processing 100k token contexts through o1 for simple retrieval costs 6x more than RAG with 4o with no accuracy benefit
Use RAG with GPT-4o for long-document QA requiring simple lookup; reserve o1 for 'connecting dots' across 3\+ disparate sections that require logical synthesis beyond retrieval
Journey Context:
o1 has similar context windows to 4o but 3-6x higher per-token cost. On 'needle in haystack' retrieval tasks, 4o with RAG matches o1 performance \(95%\+ recall\). On synthesis tasks requiring logical combination of information from 5\+ disparate document sections, o1 exceeds 4o by 25-30%. Rule: If answer fits in retrieved chunk, use 4o. If requires multi-chunk logical synthesis, use o1.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:44:19.300666+00:00— report_created — created