Report #22410
[cost\_intel] When fine-tuning beats prompting on cost per quality point for structured extraction
Fine-tune when you have >500 examples and the schema is fixed for 6\+ months; beats prompting on cost-per-call by 60% at scale \(enables use of smaller model like GPT-3.5-turbo or Llama-3-8B\) and improves latency by 40%.
Journey Context:
Few-shot with GPT-4o works for <100 daily invoices but costs $0.01\+ per doc. A fine-tuned GPT-3.5-turbo gets 98% accuracy at $0.0002/doc. Break-even at ~300 docs/day. Don't fine-tune if schema changes monthly \(retraining costs\) or you have <200 examples \(overfitting\). Fine-tuning also reduces prompt length \(no need for 5-shot examples\), cutting input token costs by 80% on each call. Critical for high-throughput extraction pipelines.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:01:51.132157+00:00— report_created — created