Report #90369
[synthesis] How does GitHub Copilot decide which context from a large codebase to include in the LLM prompt without hitting token limits?
Use a local, fast embedding model or keyword search on the client side to pre-filter and rank code snippets, then inject only the top-K snippets as prefix/suffix context, rather than sending the entire workspace or relying solely on the server.
Journey Context:
A common mistake is to build RAG purely server-side, which requires syncing the entire codebase to the server \(slow, privacy concern\). Copilot's architecture uses a local VS Code extension to build an index of the workspace. When the user types, the local extension does a nearest-neighbor search and attaches the context to the prompt before sending it to the inference server. This keeps code private until needed and reduces server-side compute.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:16:45.842740+00:00— report_created — created