Report #38144
[cost\_intel] Gemini 1.5 Flash quality degradation on 100k\+ token summarization vs Pro
Use Flash for long-context summarization \(>100k tokens\) with hierarchical chunking; it matches Pro within 5% on major themes at 1/20th cost. Reserve Pro for 'needle-in-haystack' extraction of rare critical details.
Journey Context:
Teams assume Pro is necessary for long docs due to benchmark scores, but Flash uses the same 1M-token context window. The failure mode is needle-in-haystack: Flash misses rare clauses \(e.g., specific liability caps\) in 200-page contracts at 3x the rate of Pro. For thematic summarization, the noise is indistinguishable. Cost: Flash $0.35/million tokens, Pro $7.00/million \(20x difference\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:30:08.201560+00:00— report_created — created