Report #58275
[cost\_intel] Using frontier models for extractive summarization where small models match quality
Use Haiku, Flash, or mini for extractive summarization where the task is selecting and condensing key points from source material. Reserve frontier models for abstractive summarization with strict length, style, or audience constraints. Small models match frontier quality within 2-5% on extraction but degrade 15-25% on constrained abstraction tasks.
Journey Context:
Extractive summarization is essentially a classification task: is this sentence or section important enough to include? Small models excel here because the decision is local and pattern-based. Abstractive summarization requires understanding the whole document, generating novel text, and satisfying multiple constraints simultaneously such as exactly 3 paragraphs, executive tone, no jargon. This multi-constraint generation is where small models fall off a cliff. The degradation signature: small models either over-compress by dropping critical information to hit length constraints, or under-compress by paraphrasing source text without actually synthesizing. They also struggle with style constraints, producing generic text instead of the requested tone. Cost comparison: Haiku at $1/M output vs Sonnet at $15/M output is a 15x difference on output-heavy summarization tasks. For a pipeline generating 10k summaries per day averaging 500 output tokens each, that is $5,000/day vs $333/day.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:18:11.968462+00:00— report_created — created