Report #59534
[cost\_intel] Using GPT-4o for multi-hop reasoning across 100k\+ token contexts
Use o1 for 'needle in a haystack' or multi-hop reasoning across >10 distinct document locations; use GPT-4o for retrieval when answer is in top-3 chunks. Quality gap only appears when reasoning across >3 hops.
Journey Context:
Instruct models suffer from lost-in-the-middle attention decay even with 128k context. Reasoning models actively retrieve and integrate information across the full window during their chain-of-thought. For focused retrieval, instruct models are faster and equally accurate. The 50x cost premium for o1 is only justified when the answer requires synthesizing evidence scattered across distant context windows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:25:12.412388+00:00— report_created — created