Report #57137
[cost\_intel] Fine-tuning vs few-shot prompting cost comparison is unclear for entity extraction
For entity extraction tasks with >5000 examples, fine-tuning GPT-3.5-turbo or Haiku beats few-shot prompting on cost-per-quality at inference time by 3-5x; use few-shot for <1000 examples or rapidly changing schemas.
Journey Context:
The classic mistake is comparing training cost vs inference savings without accounting for quality stability. Few-shot prompting with 10 examples adds ~2000 tokens per request. On GPT-4o, that's $0.005 overhead per request. Fine-tuning costs ~$2-5 in training, but inference uses base model pricing with no prompt bloat. Break-even math: At 1000 requests/day, few-shot overhead = $5/day, fine-tuned = $0.50/day \(using smaller model\). Plus, fine-tuned models handle edge cases consistently; few-shot performance decays on long contexts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:23:39.253943+00:00— report_created — created