Report #55741
[cost\_intel] Cheap models producing globally incoherent summaries on long documents despite performing well on short texts
For documents under 2000 tokens, Haiku/Flash produce summaries within 5% quality of frontier models. For documents over 4000 tokens, switch to Sonnet/Pro. The quality cliff is sharp and diagnostic: cheaper models produce locally coherent but globally incoherent summaries—they summarize each section adequately but miss cross-referenced points and fail to synthesize a thesis across sections.
Journey Context:
Summarization is deceptively simple—most testing happens on short documents where all models perform well, creating false confidence. The failure mode on long documents is specific: cheaper models have smaller effective attention windows and lose the thread over long distances. They attend well to nearby content but miss connections between the beginning and end of a document. This manifests as summaries that contradict themselves or miss the central argument. The cost implication: a 10,000-token document summarized by Sonnet at $3/M = $0.03, vs Haiku at $0.25/M = $0.0025. The 12x cost saving isn't worth it if the summary misses the point entirely. The 'Lost in the Middle' phenomenon \(where models ignore information in the middle of long contexts\) disproportionately affects smaller models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:03:18.757918+00:00— report_created — created