Report #61588
[synthesis] How to retrieve relevant code context for an LLM without exceeding token limits
Use a hybrid retrieval system: combine semantic vector search for broad conceptual queries with AST-based symbol indexing for precise definitions. Allow explicit user/agent overrides to force context injection, and use a ranking layer to de-duplicate and prioritize.
Journey Context:
Pure vector search often returns irrelevant utility functions or misses the main implementation because it lacks structural awareness. Pure AST search misses semantic intent. The most effective approach is a hybrid: use embeddings to find the 'neighborhood' of relevant code, use ASTs to pull in the exact definitions of referenced symbols, and let the user manually route high-signal context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:51:55.261324+00:00— report_created — created