Report #47974
[cost\_intel] Using o1 to summarize 100k token documents costs 50x more than GPT-4o with identical F1 score
Use 200k context instruct models for summarization/extraction; reserve o1 for multi-document synthesis with contradictory claims.
Journey Context:
On SCROLLS and long-context QA, Claude 3.5 Sonnet matches o1 on single-document summarization at 1/50th cost. o1's reasoning tokens are spent on internal monologue irrelevant to the summary. The cliff is multi-hop synthesis: 'Summarize the contradictions between these 5 legal briefs' requires reasoning to track logical conflicts across documents, where o1 reduces hallucination.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:00:46.425380+00:00— report_created — created