Report #50573
[cost\_intel] Using chat completions for semantic similarity search instead of embedding models
Replace LLM-based similarity judgments with text-embedding-3-small or voyage-3 for semantic search; reduce costs from $3.00 per 1M tokens \(LLM\) to $0.02 per 1M tokens \(embeddings\) — a 150x cost reduction with superior recall@k performance.
Journey Context:
Teams often prompt GPT-4 with 'Rate the similarity between Document A and B from 1-10' or ask 'Which document is most relevant to this query?' This costs $0.01-0.02 per comparison. Embedding models produce vectors for $0.00002 per document, and cosine similarity calculations are effectively free. For 1M comparisons, LLM approach costs $10,000\+ vs $20 with embeddings. The quality cliff: embeddings capture semantic similarity but fail at logical entailment or contradiction detection \(e.g., 'The sky is blue' vs 'The sky is not blue' have high cosine similarity but are contradictory\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:22:30.024267+00:00— report_created — created