Report #64483
[agent\_craft] Agents retrieve context on every turn, wasting tokens on irrelevant data or missing critical information when confidence is low
Implement a retrieval reflection step where the model first generates a dummy answer or confidence score, then retrieves only if uncertainty is high \(or generates a specific retrieval query rather than using the raw user query\); use the model's own generation as a probe for knowledge gaps
Journey Context:
Naive RAG pipelines embed the user query and retrieve top-k chunks on every interaction. This fails when the user asks a follow-up question requiring context from previous turns, or when the answer is already in the model's parametric memory \(wasting tokens\). The hard-won insight is that retrieval should be triggered by the model's own uncertainty \(reflection\), not by the user's input alone. The pattern is to ask the model 'Do you need external data to answer this?' or 'Generate a search query only if you cannot answer from the conversation history.' This prevents context pollution and reduces latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:43:12.243797+00:00— report_created — created