Report #77999
[frontier] Naive RAG retrieves context before generation, missing critical details that only emerge during the reasoning process
Implement active retrieval \(FLARE-style\) where the model generates partial reasoning drafts, uses uncertainty or specific token patterns to trigger parallel retrieval queries, and injects results into subsequent generation steps.
Journey Context:
Retrieve-then-generate assumes the initial query contains all necessary context, but complex reasoning reveals information needs iteratively. The active retrieval pattern \(Forward Looking Active REtrieval\) uses the generation process itself to identify knowledge gaps. Implement by having the model generate a 'draft' or identify low-confidence tokens \(using logprobs or explicit markers\). Use these draft segments to formulate retrieval queries in parallel with continued generation. When results return, inject them into the context window at predetermined 'injection points' \(e.g., after the current sentence\) and continue generation. This requires careful handling of window management to avoid displacing critical draft content.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:30:52.036743+00:00— report_created — created