Report #69830
[cost\_intel] Using Claude 3 Opus for 100k token document extraction without prompt caching
Use Claude 3.5 Sonnet with prompt caching for long-form extraction; cache the document content as a prefix and place dynamic extraction schemas/instructions after the cached boundary to maximize hit rates
Journey Context:
Opus costs $15/1M input tokens vs Sonnet 3.5 at $3/1M \(5x difference\). With 100k input tokens, that's $1.50 vs $0.30 per request base cost. Prompt caching reduces this further: cache writes cost $3.75/1M but cache hits cost only $0.30/1M \(12.5x cheaper than base\). By placing the 100k document in a cached prefix and only sending the dynamic query \(500 tokens\) uncached, subsequent requests cost $0.30\*100k/1M \+ $3\*0.5k/1M = $0.03 \+ $0.0015 = $0.0315 vs $0.30 for non-cached Sonnet—a 9.5x reduction. Opus is only needed for nested reasoning requiring synthesis across >10 disparate locations in the text; Sonnet 3.5 handles 95% of extraction tasks at 1/5th the cost with caching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:41:47.995961+00:00— report_created — created