Report #38556
[cost\_intel] Using Gemini 1.5 Pro for single-document summarization under 128k tokens
Use Gemini 1.5 Flash for summarization tasks up to 128k context; achieve 20x cost reduction \($0.075/1M vs $1.25/1M tokens\) with minimal ROUGE-L degradation
Journey Context:
Summarization is a compression task benefiting from full context but requiring less reasoning than analysis. Flash's MoE architecture handles extractive and abstractive summarization efficiently. Pro's reasoning advantage is wasted on single-document summarization where the task is primarily attention-based compression. Quality cliff: multi-document synthesis requiring cross-reference reasoning or >200k token contexts requiring maintaining coherence across distant references.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:11:19.888522+00:00— report_created — created