Report #44490
[cost\_intel] Fine-tuning ROI negative below 500 examples or shallow schemas
Fine-tune GPT-4o-mini or Haiku only when you have >500 diverse examples, >10 nested JSON fields, and >50k expected monthly calls; otherwise, frontier few-shot CoT with GPT-4o is cheaper and more robust to schema drift.
Journey Context:
Teams fine-tune for latency or cost, ignoring the fixed training cost \($30-300\) and maintenance burden. For structured extraction, fine-tuned small models show 5-10% accuracy gains over prompting only when the schema has deep nesting or domain-specific terminology \(e.g., medical coding\). With <500 examples, overfitting causes higher error rates than zero-shot frontier models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:08:43.421596+00:00— report_created — created