Report #71022
[cost\_intel] Small model summarization quality degrades non-linearly on complex documents
Small models \(Haiku/Flash\) produce adequate summaries for straightforward narrative content \(news, meeting notes, emails\) — within 3-5% of frontier quality. For technical, legal, or multi-topic documents, quality drops 20-30% non-linearly. Always benchmark on your hardest 10% of documents, not the easy majority.
Journey Context:
Summarization quality degradation is deceptive. On simple documents, small models look great. On complex documents, the gap is enormous. The failure mode is not gradual — it's a cliff. Small models start hallucinating specific details, omitting key provisions in legal text, or conflating distinct topics in multi-subject documents. This non-linear degradation means you cannot extrapolate from easy documents to hard ones. The common mistake: test on 20 easy documents, see 95% quality, deploy, then get 70% quality on the hard tail that matters most. The cost-effective pattern: use a small model as a first pass, then use a frontier model to review and correct summaries for documents above a complexity threshold \(detected by length, topic count, or domain classifier\). This gets you 80% of the cost savings with 95% of the quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:47:30.780048+00:00— report_created — created