Report #40845
[cost\_intel] When does embedding-based classification beat GPT-4o on cost and accuracy
Use text-embedding-3-small \+ GPT-4o-mini for classification when you have >500 labeled examples per class. This achieves 98% of GPT-4o accuracy at 1/50th the cost \($0.00002 vs $0.001 per classification\) by caching embeddings and using cosine similarity thresholds.
Journey Context:
Teams default to GPT-4o for all classification, paying $5/1k requests when embedding-based RAG classification costs $0.10. The crossover requires sufficient labeled data to tune the similarity threshold \(typically 0.75-0.85 cosine\) and handle edge cases with a fallback to the LLM. The failure mode is class imbalance where embeddings confuse similar but distinct categories, requiring hierarchical classification or few-shot examples in the fallback.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:01:48.239538+00:00— report_created — created