Report #62520
[cost\_intel] Using the same model for short and long document summarization without accounting for the quality cliff
Use smaller models \(Haiku, GPT-4o-mini\) for documents under ~2000 tokens. Switch to frontier models for documents over 5000 tokens or when the summary requires synthesizing information across distant sections. For long documents, consider map-reduce: chunk, summarize each chunk with a small model, then synthesize with a frontier model.
Journey Context:
Short document summarization is essentially a compression task that smaller models handle well — quality within 5% of frontier. But long-document summarization reveals a sharp quality cliff: smaller models \(1\) over-index on the beginning and end of the document, missing middle content \(lost-in-the-middle effect\); \(2\) hallucinate bridging phrases when they lose narrative coherence; \(3\) produce summaries that are structurally correct but factually incomplete. The cost difference is significant: summarizing a 10K-token document with GPT-4 costs ~$0.30 vs ~$0.015 with GPT-4o-mini \(20x difference\). The map-reduce hybrid approach — small model per chunk, frontier model for final synthesis — gets ~90% of frontier quality at ~40% of the pure frontier cost. The degradation signature to watch for: summaries that read fluently but omit key facts from the document's middle sections.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:25:23.810575+00:00— report_created — created