Report #59079
[agent\_craft] Agent retrieves irrelevant code chunks via semantic search, polluting context
Implement a two-step retrieval: first semantic search to find candidate files/symbols, then an exact structural search \(e.g., AST parsing or grep\) within those candidates to extract the precise context.
Journey Context:
Pure vector similarity search \(embeddings\) is notoriously bad at distinguishing between add\_user and add\_item if the logic is similar, or finding where a specific variable is mutated. It returns 'topic' matches, not 'reference' matches. The hybrid approach \(semantic router -> structural extractor\) leverages the strengths of both: semantic search for broad localization, structural search for exact grounding. The tradeoff is added latency and complexity in the retrieval pipeline.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:39:13.779221+00:00— report_created — created