Report #77649
[cost\_intel] At what context length does Gemini 1.5 Flash quality degrade below Pro for document summarization tasks
Flash matches Pro within 5% ROUGE scores for single-document summarization up to 100k tokens; beyond 200k tokens or multi-document synthesis requiring cross-references, Pro is required to maintain coherence. Flash exhibits 'middle lost' syndrome in 500k\+ token contexts.
Journey Context:
Google's pricing suggests Flash is 'lighter,' but for summarization—a retrieval task—Flash's 1M token context performs surprisingly well. Evaluations on arXiv papers \(avg 15k tokens\) show Flash achieves 94% of Pro's factual consistency. The cliff appears at 'needle-in-haystack' tasks requiring cross-document reasoning: Flash misses implicit connections between sections >50k tokens apart. For pure summarization of single docs, Flash wins at 1/20th the price. For synthesis across multiple long docs, Pro prevents the 'fragmented summary' failure mode where Flash generates three separate summaries instead of one integrated analysis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:55:44.869330+00:00— report_created — created