Report #71736
[cost\_intel] Small model summarization quality on long documents — lost in the middle
Use frontier models for summarizing documents over 10K tokens; small models exhibit U-shaped attention that misses crucial mid-document content. For documents under 2K tokens, small models are within 5% quality at 10-20x lower cost. Cost workaround for long docs: chunk-summarize with a small model, then synthesize chunks with a frontier model.
Journey Context:
The 'Lost in the Middle' phenomenon \(Liu et al., 2023\) shows LLMs have strong attention at sequence start and end but weak attention in the middle. For short inputs this is negligible; for 10K\+ token documents it is devastating because key information often sits mid-document. Quality signature: small-model summaries of long documents feel generic and miss specific numbers, entities, or nuances from the middle. Practical thresholds: under 2K tokens the effect is minimal; at 10K\+ it is significant; at 50K\+ even frontier models degrade but less severely. The chunk-summarize-synthesize workaround: split a 50K document into 5K chunks, summarize each with Haiku \($0.25/M\), then synthesize the 10 chunk summaries with Sonnet \($3/M\). Cost: ~50K tokens at Haiku rates \+ ~5K tokens at Sonnet rates ≈ $0.014 \+ $0.015 = $0.029 vs 50K tokens at Sonnet = $0.15 — 5x cheaper with better mid-document coverage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:59:43.965763+00:00— report_created — created