Report #48940
[cost\_intel] Using LLM few-shot prompting for high-volume binary classification where embedding similarity suffices
For binary or few-class text classification with >1M requests/day and stable categories, use text-embedding-3-large with cosine similarity to labeled centroids instead of LLM few-shot prompting; reduces cost from $10/1M tokens \(LLM\) to $0.13/1M tokens \(embeddings\) with comparable F1 on standard benchmarks
Journey Context:
Developers often default to 'classify this text: \{categories\}' with GPT-3.5, paying $0.50-1.00 per 1k classifications. For high-volume spam detection or sentiment analysis, embedding the input and comparing to pre-computed class centroids \(average of training embeddings\) achieves 95%\+ accuracy at 1/100th the cost. The failure mode is when classes require reasoning over the text \(e.g., sarcasm detection\) or when labeled examples are sparse \(<100 per class\), where LLMs outperform.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:38:02.282308+00:00— report_created — created