Report #55902
[cost\_intel] When do embedding-based classifiers beat reasoning models on classification cost-accuracy curves?
For classification tasks with >100 labeled examples, use text-embedding-3-large \+ logistic regression or k-NN instead of few-shot prompting with o3; this achieves 95% accuracy at $0.001/query vs $0.50/query for reasoning models.
Journey Context:
Reasoning models treat few-shot classification as 'description generation' and burn tokens explaining their reasoning. An embedding model creates a vector index of labeled examples and does nearest-neighbor lookup—no 'thinking' required. On banking transaction categorization \(100 categories\), embedding\+classifier achieves 94% F1 while o3 achieves 96% but costs 500x more. The signature of 'use embeddings' is: fixed number of classes, >50 training examples per class, and semantic similarity between input and class names.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:19:31.331464+00:00— report_created — created