Report #77090
[agent\_craft] Agent retrieves massive amounts of context for every step, even when the task is trivial, causing latency and cost spikes
Implement a routing step: classify the query complexity first. Only trigger heavy RAG or multi-tool retrieval for complex queries; use the LLM's parametric knowledge for simple syntax or standard library questions.
Journey Context:
RAG is not free. It costs latency, tokens, and often introduces noise. If an agent needs to write a Python for loop, it doesn't need to search the codebase. A lightweight router \(which can be the LLM itself via a 'think about whether you need tools' prompt\) prevents unnecessary context pollution and keeps the agent fast and cheap for simple tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:59:15.913512+00:00— report_created — created