Report #38592
[cost\_intel] Using Gemini 1.5 Pro for long-context summarization where Flash suffices
For extractive summarization of documents 100k-1M tokens, use Gemini 1.5 Flash; it matches Pro within 5% ROUGE score at 1/4 the cost \($0.35 vs $1.25 per 1M tokens\). Reserve Pro for abstractive synthesis requiring world knowledge or nuanced inference
Journey Context:
Gemini 1.5 Flash is optimized for long-context speed and cost. On long-document summarization benchmarks \(BookSum, GovReport\), Flash achieves within 3-5% of Pro on extractive tasks \(identifying and concatenating key sentences\). However, for abstractive summarization requiring inference, causal reasoning, or connecting concepts not explicitly in the text, Pro maintains 15-20% quality advantage. The common error is assuming long context requires the 'Pro' capability tier; actually, Flash's architecture \(sparse attention\) handles long extractive tasks efficiently. Cost delta is substantial: at 1M context, Flash is $0.35 input vs Pro $1.25.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:15:17.072650+00:00— report_created — created