Report #70705
[cost\_intel] Deploying o1 for real-time entity linking in RAG pipelines
Use GPT-4o for embedding-based retrieval and entity disambiguation; o1 adds 20s latency for marginal accuracy gains on rare ambiguous entities \(e.g., 'Jordan' the country vs person\)
Journey Context:
RAG entity linking is a similarity search problem \(vector cosine\) plus a shallow classifier \(BM25\). Reasoning models approach it as a 'world knowledge reasoning' task, debating the likelihood of 'Michael Jordan vs Jordan the country' based on context. This is overkill: GPT-4o with few-shot examples handles 95% of cases; the 5% ambiguity is better handled by retrieval augmentation \(fetching disambiguation page\) than by slow reasoning. The 20s latency breaks real-time RAG UX.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:15:18.766980+00:00— report_created — created