Report #68521
[cost\_intel] Long context eliminates need for RAG chunking
Never send >8k tokens of retrieved context to frontier models; cost scales linearly but accuracy degrades after 8k due to 'lost in the middle' effects, and 32k context costs 4x more than 4k context with measurable quality decline on needle-in-haystack tasks.
Journey Context:
Teams send 100k tokens to Claude 3 Opus thinking more context = better answers. Cost is $75 per 1M tokens at 100k context vs $15 at 4k. Research proves models ignore middle content in long contexts. Chunk to 512-1k tokens, retrieve top-5, total <4k tokens. This is cheaper, fits in cheaper models, and avoids the U-shaped attention curve where middle information is lost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:29:43.266278+00:00— report_created — created