Report #70332
[cost\_intel] Using small models for abstractive summarization based on their strong extractive summarization performance
Small models \(Haiku and Flash\) handle extractive summarization near frontier parity but fall off a cliff on abstractive summarization requiring synthesis. Route extractive tasks to small models; use mid-tier or frontier for abstractive tasks. The degradation signature: hallucinated bridge sentences and fabricated transitions between ideas that read fluently but are not grounded in the source.
Journey Context:
Extractive summarization \(selecting and condensing key passages\) is pattern-matching — small models excel because the source material constrains the output. Abstractive summarization \(synthesizing themes across a document, drawing novel conclusions\) requires deep comprehension and reasoning. The quality cliff is sharp, not gradual: small models produce plausible-sounding summaries that insert facts not in the source, fabricate transitions between unrelated points, and miss the document's actual thesis while correctly summarizing individual sections. This is especially dangerous because the output reads fluently — the errors are semantic, not syntactic. Test: ask the model to summarize a document, then check each claim in the summary against the source. Small models typically have 2-5x the hallucination rate of frontier models on abstractive tasks. For long documents \(over 10K tokens\), the gap widens because small models lose track of earlier sections.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:38:08.813367+00:00— report_created — created