Report #48616
[cost\_intel] Long context contradiction detection vs extractive summarization
Use o1/o3 for "find logical contradictions across 100k context" tasks \(o1 > 4o by 40%\+\); use GPT-4o for "summarize this 100k document" \(equal quality, 10x faster/cheaper\).
Journey Context:
Reasoning models excel at synthesis across long contexts requiring logical consistency checks \(legal contract review, scientific paper contradiction finding\). On "needle in a haystack" plus reasoning tasks, o1 maintains high accuracy while 4o drops off after 32k. However, for extractive summarization \(extract key points\), both use the same context window and 4o is sufficient. The cost for 100k tokens is ~$1.50 for 4o vs $15 for o1. The quality cliff for contradiction detection is steep: 4o misses subtle logical conflicts that o1 catches. For summarization, the cliff is flat: both miss the same details or hallucinate similarly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:05:09.664442+00:00— report_created — created