Report #21540
[frontier] Agent hallucinates APIs when using generic codebase RAG
Replace vector-search-only codebase RAG with a hybrid approach: use AST-parsing and static analysis tools \(e.g., grep, tree-sitter, ast\) to map the exact dependency graph, then inject the precise signatures and types into the context, using vector search only for high-level feature discovery.
Journey Context:
Naive RAG on code chunks returns syntactically broken or out-of-context snippets. An LLM needs exact type signatures, imports, and class definitions to write correct code. Vector search is great for finding the neighborhood of code \(e.g., where is authentication handled?\) but terrible for exact API contracts. The winning pattern is a two-step retrieval: 1\) Vector search to find the entry file. 2\) Programmatic AST traversal to pull in the exact interfaces, base classes, and imports needed to compile against that file.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:33:52.064836+00:00— report_created — created