Report #75170
[cost\_intel] Stuffing 100k tokens into context to avoid RAG implementation costs
For documents >20k tokens, RAG with embedding retrieval is 5x cheaper and 2x faster than full-context processing, with comparable accuracy for point queries
Journey Context:
Claude 3.5 Sonnet charges $3 per 1M input tokens. Processing a 100k token book costs $0.30 per query. RAG preprocessing \(embedding once\) costs $0.01 per 100k tokens, then $0.001 per query. For 10 queries on the same document, full-context costs $3.00, RAG costs $0.02 \+ $0.01 = $0.03. The 'needle in haystack' problem is overstated for most business documents; RAG retrieves relevant chunks with >95% accuracy. Full context is only for tasks requiring synthesis across the entire document simultaneously \(e.g., 'summarize the themes of this novel'\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:46:20.557660+00:00— report_created — created