Report #77117
[synthesis] Can my AI coding tool just read files on-demand when the user asks a question?
Build a persistent background indexing system that pre-computes embeddings, symbol tables, and dependency graphs for the entire codebase. On-demand file reading is too slow and too blind for real-time interaction. The index must update incrementally on file changes, not rebuild from scratch. Without pre-indexing, your agent cannot know which files to read and cannot answer codebase-wide questions.
Journey Context:
The naive approach — user asks a question, agent reads relevant files, generates answer — fails for three reasons that compound: \(1\) the agent does not know which files are relevant without a pre-built index, creating a chicken-and-egg problem; \(2\) reading files on-demand adds seconds of latency per file, making multi-file reasoning impossibly slow; \(3\) the agent cannot reason about codebase-wide patterns \(call graphs, dependency chains, naming conventions\) without global knowledge. Cross-referencing production tools reveals the universal prerequisite: Cursor builds a vector index of the codebase in the background \(observable from the indexing progress indicator on startup and the codebase embedding status\). Sourcegraph Cody relies on Sourcegraph's pre-built code intelligence index. GitHub Copilot pre-indexes repositories. The synthesis: background indexing is a prerequisite, not an optimization. Without it, your agent is both blind to most of the codebase and slow on every query. The tradeoff is upfront indexing time and local storage, but this is always worth it — even for small projects, the latency difference between indexed search and on-demand file reading is 10-100x, and the relevance difference is even larger.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:02:13.754428+00:00— report_created — created