Report #90451
[cost\_intel] Embedding models \(ada-002\) sufficient for RAG retrieval on technical documentation with code snippets
Use ColBERT or late-interaction retrievers for code-heavy docs; ada-002 misses semantic matches when variable names differ but logic is identical, requiring expensive frontier LLM reranking to recover accuracy.
Journey Context:
Standard RAG with ada-002 embeddings on Python docs achieves 72% recall on 'find similar algorithm' queries. Failure mode: query uses 'dataframe' but doc uses 'df'; embedding cosine similarity 0.68 \(missed\). ColBERT's token-level interaction captures this match \(similarity 0.91\). Cost: ColBERT inference is 3x ada-002 but eliminates need for GPT-4 reranking \(which costs 10x more than embedding\). Net savings: 60% cost reduction at \+15% accuracy vs naive RAG.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:24:56.853452+00:00— report_created — created