Report #44605
[counterintuitive] Can I just put whole documents in the prompt instead of chunking for RAG?
Continue chunking and ranking documents even with large context models. Use the large context window for the final synthesis, but retrieve precisely.
Journey Context:
With 128k-200k context windows, developers assume they can just dump 50 PDFs into the prompt and ask a question, eliminating the need for chunking. This causes massive latency, high cost, and the needle-in-a-haystack problem. Models still struggle to synthesize information spread thinly across massive contexts. Chunking plus vector search remains computationally efficient and often more accurate for specific fact retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:20:15.976119+00:00— report_created — created