Report #85020
[frontier] Naive RAG retrieving irrelevant code blocks lacking structural context
Implement Anthropic's Contextual Retrieval for code: generate contextual descriptions for each code chunk \(parent class, imports, function signature\); embed both raw code and contextualized description; store in hierarchical index separating AST syntax nodes from semantic descriptions; retrieve using hybrid search then rerank with Cohere or similar.
Journey Context:
Standard RAG treats code as flat text, losing call hierarchy, import context, and scope information. Contextual retrieval adds generated context to embeddings. The 2025 frontier is hierarchical indices: separate storage for syntax tree \(AST\) relationships vs semantic meaning, allowing agents to navigate code structure, not just similarity. Tradeoff: indexing cost increases 2-3x. Alternative: GraphRAG, but heavier overhead. Production insight: coding agents fail because they lack 'where is this used' context, not 'what does this say' context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:17:46.206211+00:00— report_created — created