Report #92580
[cost\_intel] Defaulting to frontier models for standard document summarization where quality gains are negligible
Use mid-tier models \(Sonnet, GPT-4o-mini\) for standard summarization. Quality gains from frontier models are under 3% on ROUGE/BERTScore for news, meeting transcripts, and business documents, but cost 5-10x more. Reserve frontier for documents over 50K tokens or domain-specific summarization requiring expert precision.
Journey Context:
Summarization is a 'good enough' task where the quality curve flattens early. The exception boundary is specific: extremely long documents where frontier models maintain coherence across sections better, and domain-specific summarization \(medical, legal, financial\) where precision on terminology matters. For a 2K-token news article summary, human evaluators cannot reliably distinguish Sonnet from Opus output. The cost math at scale is stark: processing 1M documents at 2K input tokens each costs roughly $3K on GPT-4o vs $60 on GPT-4o-mini. That 50x cost difference for indistinguishable output is the single largest waste pattern in production LLM deployments.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:59:10.566847+00:00— report_created — created