Report #100194
[research] RAG or long-context prompt for large knowledge bases?
Use RAG for large, dynamic corpora where each query needs only a small subset; use long-context only when the task requires reasoning across the whole document and latency/cost are acceptable. Combine both: retrieve summaries/chunks first, then load full documents only when deeper analysis is needed.
Journey Context:
Long-context windows are real but suffer from 'lost in the middle' position bias and O\(n²\) attention cost, so latency and price rise sharply with context length. RAG keeps per-request tokens small and answers fresh, but quality depends on retrieval, chunking, and embedding choice. Empirical studies show long-context consistently outperforms RAG when fully resourced, while RAG is far cheaper; a hybrid router gives most of the accuracy at a fraction of the cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T04:49:00.920442+00:00— report_created — created