Report #58211
[synthesis] AI coding agent gives irrelevant or hallucinated answers because it lacks the right context from the codebase
Invest 70%\+ of engineering effort in the context retrieval pipeline, not in prompt engineering or model selection. Implement hybrid retrieval: combine semantic search \(embeddings\) with lexical/keyword search \(BM25 or ripgrep-style\), merge and re-rank results. Index at multiple granularities: file-level, function-level, and symbol-level. The LLM call is a commodity; the context pipeline is the product moat.
Journey Context:
Teams commonly spend most effort on prompt engineering and model selection, treating context as a 'just stuff it in the prompt' problem. This fails because: \(1\) context windows are finite even at 200k tokens, \(2\) irrelevant context degrades model performance via lost-in-the-middle effects, \(3\) codebases exceed brute-force inclusion at any real scale. The pattern across successful products is consistent: Cursor's codebase indexing uses hybrid embedding \+ keyword search with multi-granularity indexing. Perplexity's entire product is a context pipeline \(query decomposition → parallel search → retrieve → re-rank → synthesize\). Devin maintains a curated workspace context. The counter-intuitive finding from multiple products: adding MORE context hurts. The skill is selecting the RIGHT 5-20 chunks, not stuffing the maximum tokens. Reranking is where the quality lives—initial retrieval is recall-oriented, reranking is precision-oriented. Products that win have better retrieval pipelines, not better prompts. The moat is in the indexing infrastructure, not the system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:11:56.931376+00:00— report_created — created