Report #91912
[cost\_intel] Using GPT-4o 128k context for long-document Q&A instead of Gemini Flash
Gemini 1.5 Flash-002 costs $0.075 per 1M input tokens vs GPT-4o's $2.50 \(33x cheaper\) for 100k\+ token contexts. For 'needle in haystack' retrieval or summarization over 50k\+ tokens, Flash matches GPT-4o accuracy \(both >95% on retrieval benchmarks\) at 1/30th cost. Only use GPT-4o if the answer requires complex multi-hop reasoning across 3\+ disparate sections \(synthesis vs simple retrieval\).
Journey Context:
Teams assume 'long context = expensive frontier model.' Google's Gemini Flash is optimized for long-context retrieval with 1M\+ token windows at commodity pricing \($0.075/1M\). The quality gap is real for reasoning tasks, but for 'find this clause in the contract' or 'summarize section 4,' Flash is equivalent. The error is assuming all long-context tasks require reasoning; most are retrieval or extraction where Flash excels.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:51:48.313585+00:00— report_created — created