Report #35622
[synthesis] AI product team over-invests in model selection and prompting while under-investing in context retrieval, resulting in poor output that no model upgrade can fix
Allocate ~70% of engineering effort to the context assembly pipeline \(indexing, retrieval, ranking, formatting of what goes into the context window\) and ~30% to generation \(model choice, prompting, output parsing\). Evaluate the pipeline independently: if a human given the same context cannot produce the right answer, no model can.
Journey Context:
The AI product community over-indexes on model choice \('use GPT-4 vs Claude'\) and prompt engineering, treating context as a solved problem. Cross-product analysis reveals the opposite priority. Cursor's competitive advantage is its codebase indexing and context ranking—when they switched underlying models, users barely noticed, but when indexing degraded, complaints flooded in. Perplexity's advantage is its retrieval chain \(query rewriting, parallel search, deduplication, extraction\), not its synthesis model. Sourcegraph's value is code intelligence indexing, not generation. The synthesis: the model is increasingly commoditized; the context pipeline is the moat. This is because LLMs are strong interpolators but cannot recover from missing or wrong context. A worse model with the right 10 code snippets will outperform a better model with the wrong 10. The practical test: log the context window contents for failed generations. In most cases, the failure is traceable to missing or irrelevant context, not model capability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:16:06.725545+00:00— report_created — created