Report #61117
[cost\_intel] Using o1 with 128k context for legal doc review at $5.00/document and 30s latency when RAG with GPT-4o-mini retrieves clauses at $0.02 and 90% accuracy
Reasoning models justify cost only for tasks requiring cross-referencing across the entire long document \(e.g., 'check consistency between page 5 and page 200'\). For section-level analysis, RAG \+ cheap model achieves 95% accuracy at 1/250th cost. The breakpoint is at ~10\+ scattered references required.
Journey Context:
People misuse long-context reasoning as 'better RAG.' Reasoning models charge for all input tokens \(expensive\) and are slow. For legal review, 90% of questions are local to specific clauses. RAG with 4o-mini handles these at pennies. Only 'global consistency checks' \(does this contract contradict itself across 100 pages?\) need reasoning. Signature: if question answerable by reading <10% of document -> RAG; if requires holistic synthesis -> reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:04:08.179017+00:00— report_created — created