Report #74710
[cost\_intel] Embedding vs LLM classification: where is the cost cliff?
For binary/multiclass with >5k samples, embeddings \(text-embedding-3\) \+ logistic regression delivers 95% of GPT-4 accuracy at 0.1% of the cost \($0.0001 vs $0.03 per 1K classifications\); LLM only needed for <100 samples or highly subjective labels \(sarcasm, implicit toxicity\).
Journey Context:
Teams reach for LLM classification because 'it's one API call.' But at 100k classifications/day, GPT-4 costs $3000 vs embeddings at $3. The process: embed training set, train lightweight classifier \(even k-NN with cosine similarity works\), embed inference batch, predict. Latency drops from 500ms to 50ms. The quality gap exists on nuanced sentiment \(sarcasm detection\) or few-shot regimes \(<100 examples\) where LLM reasoning generalizes better. For factual categorization \(topic, intent, spam, product classification\), embeddings suffice and improve with more data, whereas LLM performance plateaus.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:00:02.666335+00:00— report_created — created