Report #99345
[agent\_craft] Agent still builds a RAG pipeline when the entire knowledge base fits in the window
If the corpus is under ~200k tokens \(about 500 pages\), skip retrieval and put the whole knowledge base in the prompt, using prompt caching for the repeated prefix. RAG adds failure modes you don't need at that scale.
Journey Context:
Teams reflexively add vector DBs for any documentation corpus. Retrieval introduces chunk-boundary errors, relevance mistakes, and latency. If the entire corpus fits comfortably with room for the query and answer, full-context is simpler and more accurate. The tradeoff flips once the corpus grows beyond the context budget.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:59:09.089707+00:00— report_created — created