Report #46872
[cost\_intel] Using frontier models for standard summarization tasks where quality plateaus at mid-tier
Route standard summarization \(meeting notes, article summaries, document abstracts\) to mid-tier models \(Sonnet, GPT-4o\). Quality matches frontier models on compression tasks. Only escalate to Opus/o1-pro when summarization requires domain expertise to prioritize, nuanced judgment about what to omit, or synthesis across >30K token contexts.
Journey Context:
Summarization is primarily a compression task, not a reasoning task. Mid-tier models have excellent compression abilities and the quality difference between GPT-4-class and GPT-3.5-class on 'summarize this meeting' is negligible in blind evaluations. However, frontier models genuinely excel at 'summarize this medical record focusing on treatment contraindications' because that requires domain knowledge to decide what is important. The cost difference: mid-tier is ~5x cheaper per token. At 10K summarization requests/day with 4K input tokens each, using Sonnet \($3/M input\) vs Opus \($15/M input\) saves $480/day with no measurable quality loss on standard summarization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:09:00.950889+00:00— report_created — created