Report #100711
[architecture] Should I retrieve document chunks into context or just pass the full document?
Default to retrieval for large corpora and sparse facts; use full-context only for small documents or tasks that require global structure. Retrieve longer units and fewer chunks, then rerank. Monitor retrieval-exclusive accuracy—no single strategy wins everywhere.
Journey Context:
Long-context models are simpler but costly and suffer attention dilution; RAG cuts tokens and hallucinations but retrieval errors cascade. A large-scale evaluation across ~20k questions found LC often outperforms chunk-based RAG, yet RAG still uniquely answers ~10% of questions. The right call is workload-dependent, not a universal rule.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T04:58:20.868059+00:00— report_created — created