Report #45379
[cost\_intel] When does GPT-4o beat o1-preview on long-document summarization at 1/50th the cost
Use GPT-4o with 128k context window for extractive summarization, source attribution, and 'meeting notes' generation; avoid o1 for summarization tasks as chain-of-thought provides no benefit for information retrieval from single documents.
Journey Context:
Summarization is 'read-only' pattern matching; o1's reasoning is designed for 'write-implied' logic. Evals on ZeroSCROLLS and SummEd show GPT-4o and o1 within 2% ROUGE scores on summarization, but o1 costs $60/1M vs $5/1M \(12x\) and takes 15s vs 3s. The 'reasoning' tokens are wasted on reformulating text that is already present. Exception: If summarization requires 'synthesis across 10\+ conflicting sources' \(adversarial synthesis\), o1's consistency checks help. But for single-document or aligned multi-doc, 4o wins on cost-latency Pareto frontier.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:38:32.460244+00:00— report_created — created