Report #95154
[cost\_intel] Where does Gemini 1.5 Flash match Pro performance in 100k\+ token contexts versus failing catastrophically?
Flash matches Pro on single-document retrieval and needle-in-haystack up to 1M tokens, but fails on multi-hop reasoning across 10\+ chunks. Use Flash for retrieval, Pro for cross-document synthesis.
Journey Context:
Google pricing: Flash ~$0.35/1M vs Pro ~$3.50/1M \(10x difference\). Both support 1-2M contexts. Quality cliff: Flash loses coherence when comparing info from page 5, page 200, and page 800 simultaneously. Pro maintains global context better. Common error: using Flash for legal document comparison across 50 files \(fails to spot contradictions\). ROI pattern: Flash for "find all mentions of X" \(extraction\), Pro for "what is the relationship between X and Y" \(analysis\). Note: Gemini doesn't offer prompt caching, so long context is expensive; Flash is essential for pre-filtering before Pro synthesis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:17:34.202180+00:00— report_created — created