Report #87019
[cost\_intel] Using Flash/Haiku for summarizing documents > 10K tokens and expecting high fidelity
Route long-context summarization \(>10K tokens\) to Sonnet/GPT-4o. Small models suffer from 'lost in the middle' and default to generic summaries for long texts.
Journey Context:
Small models can ingest 100K tokens, but their extraction quality degrades linearly after ~8K tokens. They summarize the beginning and end, ignoring the middle. Frontier models maintain extraction fidelity up to ~50K tokens. The 10x cost increase is justified if missing a middle detail causes a downstream failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:39:16.557824+00:00— report_created — created