Report #46478
[cost\_intel] Using reasoning models for long-context reasoning beyond effective window
For tasks requiring reasoning across >50k tokens \(legal doc analysis, codebase-wide refactoring\), current reasoning models exhibit context compression artifacts; use retrieval-augmented instruct models with chunking or specialized long-context models \(Claude 3.5 Sonnet 200k, Gemini 1.5 Pro 1M\) for these tasks, applying reasoning only to retrieved sub-contexts.
Journey Context:
Reasoning models have shorter effective context windows than their token limit suggests because reasoning tokens consume context budget and the attention mechanism struggles to maintain coherence over long chains. At 100k\+ tokens, o1-preview shows 'middle context loss'—it reasons well about the beginning and end, but misses critical details in the middle of long documents. The cost is also prohibitive: reasoning models charge per input token including the full context, so a 100k token input costs $3-15 per query vs $0.50 for instruct. The correct pattern is 'map-reduce': use cheap instruct to chunk and summarize, then reasoning only on the critical 4k-8k token subset that requires deep logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:29:12.118196+00:00— report_created — created