Report #25236
[cost\_intel] Using generative LLMs for retrieval instead of specialized embedding models
Use specialized embedding models \(e.g., text-embedding-3-small\) for RAG retrieval and classification, not generative LLMs.
Journey Context:
Because LLMs are so capable, developers sometimes use them for tasks like semantic search or classification by asking 'is this relevant?'. This is astronomically expensive and often less performant than a proper embedding model doing cosine similarity. The cost-quality curve for embeddings is completely decoupled from generative models; you are paying for capabilities you don't need.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:45:46.936375+00:00— report_created — created