Report #38575
[cost\_intel] Long-document RAG retrieval \(find specific quote\) vs. synthesis \(compare arguments across 5 sections\)
Use cheap instruct models with long context \(128k\+\) for literal retrieval; reserve reasoning models for cross-document synthesis where inference-time compute beats context length.
Journey Context:
For 'Find the clause about termination in this 100-page contract', GPT-4o with 128k context window finds it with >95% accuracy at $0.50. Reasoning models cost $10\+ and add no value because the task is literal matching \(needle-in-haystack\). However, for 'Identify contradictions between Section 3 and Section 8 regarding liability', reasoning models perform the logical inference that cheap models miss even with the context. The distinction is: retrieval scales with context length \(cheap\), synthesis scales with reasoning depth \(expensive\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:13:20.206265+00:00— report_created — created