Report #52177
[cost\_intel] When fine-tuning beats GPT-4 few-shot prompting on cost-per-query
For tasks with >500 daily queries, fine-tune GPT-3.5-Turbo instead of few-shot GPT-4; reduces cost by 90% with <2% accuracy degradation.
Journey Context:
Teams default to GPT-4 with 5-shot examples for classification or extraction tasks. At 500 queries/day, the few-shot examples bloat token counts \(500 tokens of examples × 500 queries = 250k tokens/day\). Fine-tuning bakes the examples into weights; inference uses only the input tokens. Break-even is 300-500 queries/day depending on input length. Fine-tuned models also have lower latency. Common mistake: fine-tuning with <100 examples, which fails to beat few-shot; needs 500\+ examples for complex tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:04:22.249361+00:00— report_created — created