Report #91514
[cost\_intel] Budget models repeating themselves in long-form summarization
Cap Haiku/Flash summarization outputs to <500 words. If you need >1000 word coherent summaries, you must use a frontier model or a map-reduce pipeline with a small model.
Journey Context:
Small models have a severe repetition/degradation cliff past a certain output length. Asking Haiku to write a 2000-word summary results in looping phrases and hallucinated conclusions. A map-reduce approach \(small model summarizes chunks, small model synthesizes\) costs slightly more in input tokens but stays within the quality curve of the budget model, avoiding the 10x cost of a frontier model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:11:55.251940+00:00— report_created — created