Report #94001
[agent\_craft] Multi-step retrieval pipelines introduce excessive latency and context thrashing before action
Use a parallel tool call or a single broad retrieval step followed by an LLM filter, rather than sequential router -> retriever -> re-ranker -> LLM steps.
Journey Context:
To be 'smart', agents are wired with complex RAG pipelines: classify intent -> route to DB -> retrieve -> re-rank -> generate. Each step is an LLM call or API call, adding seconds of latency and multiplying the chance of an error. For coding agents, a faster pattern is to do a broad search \(e.g., ripgrep the whole repo\) and pass the top 20 results directly to the strong coding LLM to filter and use. The LLM is better at filtering than a specialized small re-ranker model, and one round-trip is faster than four.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:22:04.079542+00:00— report_created — created