Report #83265
[cost\_intel] Using frontier models with few-shot prompting for high-volume classification instead of fine-tuned small models
Fine-tune GPT-4o-mini for binary classification; it beats GPT-4o few-shot F1 by 2-4 points at 1/50th the cost, breaking even at ~100k classifications
Journey Context:
Frontier models excel at zero-shot generalization but suffer from prompt instability and high latency. Fine-tuned mini models compress domain knowledge into weights, eliminating context window usage and reducing latency by 80%. Failure mode is distribution shift; requires monitoring for concept drift. The hidden cost is training data curation—break-even assumes clean labeled data is available.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:20:43.257002+00:00— report_created — created