Report #53318
[cost\_intel] Small models losing critical information in the middle of long-context summarization
Use frontier models for docs >8k tokens requiring holistic synthesis, or chunk and map-reduce with small models.
Journey Context:
Models like Haiku/Flash advertise 128k\+ context windows, but suffer from severe 'lost in the middle' degradation. For summarization of 20k token documents, small models will faithfully summarize the first and last 5k tokens but hallucinate or omit the core middle arguments. Frontier models maintain global attention. Chunking \+ map-reduce with small models is cheaper but loses cross-paragraph synthesis; frontier single-pass is required for high-signal synthesis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:59:30.329426+00:00— report_created — created