Report #74753
[cost\_intel] Using few-shot prompting with 10 examples on every request instead of fine-tuning for high-volume structured extraction
Fine-tune GPT-4o-mini or Llama-3.1-8B when daily volume exceeds 10k requests and the task requires >5 few-shot examples per prompt; break-even typically at 5k-10k requests/day
Journey Context:
Few-shot prompting loads examples into every context, burning tokens on repetition. At 10k requests/day with 5 examples \(2000 tokens\), you're paying for 20M tokens of repeated content daily. Fine-tuning bakes the patterns into weights, reducing inference to zero-shot with higher accuracy. The upfront cost \($200-500 for training\) pays back in 1-2 weeks at scale. The quality often improves because the model learns the specific output distribution rather than pattern-matching on similar examples. Use this for structured extraction, classification with >10 categories, and tone/style adaptation. Do NOT use for rapidly changing schemas or tasks requiring broad world knowledge updates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:04:10.180984+00:00— report_created — created