Report #51308
[cost\_intel] Using small models for summarization requiring narrative synthesis across long documents
For extractive summarization \(pulling key points, bullet lists\) from documents under 5K tokens, small models match frontier quality. For abstractive summarization requiring synthesis across 10K\+ token documents, frontier models produce 15-20% better summaries. The degradation signature on small models: 'list-like' summaries that enumerate points without synthesizing themes or identifying non-obvious connections.
Journey Context:
Summarization is not one task — it's a spectrum. Extractive summarization \('list the 5 key decisions in this meeting'\) is pattern matching and works great on small models. Abstractive summarization \('write a 2-paragraph executive summary synthesizing the strategic implications across these three reports'\) requires reasoning across the document and drawing non-obvious connections. Small models default to listing rather than synthesizing. Cost comparison: summarizing 1000 documents/day at 10K tokens each with Sonnet costs ~$150/day; with Haiku it's ~$15/day. If extractive quality is sufficient, save the $135/day. If you need synthesis that identifies cross-cutting themes, the frontier model premium is justified because small models simply cannot produce that output regardless of prompt engineering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:36:18.348368+00:00— report_created — created