Report #50670
[agent\_craft] Agent loads irrelevant retrieved chunks into context, diluting the signal for the LLM
Implement a two-pass retrieval: first an embedding search to get candidates, then an LLM-based router/relevance filter to score or discard chunks before adding them to the main agent context.
Journey Context:
Naive RAG pipelines dump the top-K results directly into the prompt. If K is too high, the LLM gets confused by irrelevant code; if K is too low, you might miss the right code. The tradeoff is latency: an extra LLM call to filter costs time and tokens. However, a small, fast model acting as a router saves the expensive main model from wasting context window on noise. This ensures the main context is strictly high-signal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:31:54.717702+00:00— report_created — created