Report #85071
[cost\_intel] Using frontier models for extractive summarization where small models match within 5%
Use Haiku/Flash for extractive summarization and straightforward abstractive summarization. Reserve frontier models for summarization requiring nuanced judgment about strategic importance, cross-referencing multiple sources, or matching a specific analytical voice. Quality gap for extractive: <5%. For complex abstractive: 15-20%.
Journey Context:
Extractive summarization \(selecting and condensing key passages\) is pattern-matching where small models excel. Abstractive summarization varies widely. Summarizing a meeting transcript into action items: small model works. Summarizing 10 research papers into a comparative analysis with novel synthesis: frontier required. The cost difference at scale: processing 10K documents/day at 2000 input tokens each with 500-token summaries. Haiku: ~$1/M input \+ $5/M output = ~$45/day. Sonnet: ~$3/M input \+ $15/M output = ~$135/day. Opus: ~$15/M input \+ $75/M output = ~$675/day. The degradation signature for small models on complex summarization: outputs are factually correct but shallow — they capture what was said but miss implicit connections, strategic implications, or contradictions between sources. If your summarization task can be evaluated by checking factual coverage alone, small models suffice.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:22:50.121937+00:00— report_created — created