Report #96554
[cost\_intel] Using single-shot long-context summarization with GPT-4/Claude 3 Opus on documents >50k tokens, costing 5-10x more than tiered map-reduce
Implement hierarchical map-reduce: chunk documents at 8k-16k token boundaries, summarize chunks with Claude 3 Haiku \($0.25/1M tokens\), then synthesize final summary with Claude 3 Sonnet \($3/1M tokens\), avoiding Opus \($15/1M tokens\) entirely
Journey Context:
A 100k token document sent to Claude 3 Opus costs $1.50 in input tokens alone \(at $15/1M\). With map-reduce: 12 chunks of ~8k tokens processed by Haiku = 96k tokens @ $0.25/1M = $0.024. Final synthesis of ~12 chunk summaries \(~6k tokens\) with Sonnet = $0.018. Total: ~$0.04 vs $1.50—a 37x saving. The cliff: Map-reduce loses cross-chunk dependencies and global narrative structure \(e.g., a mystery novel where the ending references the beginning\). Degradation signature: Summaries become list-like \('Chapter 1 said X, Chapter 2 said Y'\) rather than synthetic; key themes that emerge only from juxtaposing distant sections are missed. Monitor via coherence scores \(ask model to rate summary cohesion\) or by checking for repetition of concepts across chunks. If degradation is detected, use 'refine' pattern \(iterative summarization where each chunk summary incorporates the previous summary\) rather than pure map-reduce, keeping the cost lower than Opus but higher than pure map-reduce.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:38:52.524093+00:00— report_created — created