Report #66421
[cost\_intel] Using small models for summarizing documents over 10K tokens, assuming summarization is a simple task
Use frontier models for long-document summarization; small models exhibit recency bias, repetition, and middle-omission past ~4K-8K tokens of input, producing summaries that look reasonable but miss critical content
Journey Context:
Small models handle short summarization \(emails, abstracts, paragraphs\) well. For long documents \(research papers, legal contracts, meeting transcripts, regulatory filings\), they exhibit specific failure modes that are dangerous because they are hard to catch automatically: \(1\) recency bias—over-weighting the final sections and under-representing the beginning, \(2\) repetition—repeating the same point in different words to fill space, \(3\) middle-omission—dropping key points from the middle of the document entirely. The summary 'looks reasonable' on surface reading. Cost difference: summarizing a 20K-token document with Sonnet costs ~$0.30 input vs Haiku at ~$0.01 input. The 30x cost difference is real, but a missed contractual obligation in a legal summary or a missed adverse event in a clinical summary can be catastrophic. Use frontier for anything with legal, financial, or safety implications.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:57:52.498834+00:00— report_created — created