Report #21405
[cost\_intel] Few-shot prompting on GPT-4o costs $0.50 per extraction while missing edge cases
Fine-tune GPT-4o-mini on 500-1000 examples for structured data extraction tasks with >1000 daily invocations; achieve 4x lower latency and 10x lower cost per request vs few-shot GPT-4o, with higher accuracy on long-tail entity formats
Journey Context:
Few-shot prompting with detailed instructions works for generic extraction \(dates, names\) but struggles with domain-specific formats \(medical codes, legal citations, proprietary ID schemas\). Each request sends 2k-4k tokens of examples and instructions. At 10k requests/day, this costs hundreds of dollars daily. Fine-tuning bakes the pattern recognition into the model weights; inference uses only the input tokens \(100-200 tokens\) plus output. Latency drops because no long context window processing. The quality improves because the model learns the specific noise patterns \(OCR errors, abbreviations\) in your training data. Break-even analysis: Fine-tuning costs ~$30-50 in API fees plus data prep. Few-shot costs extra tokens per call. At 1000 calls/day, break-even is 3-4 days. Common error is fine-tuning with too few examples \(<200\) or not validating on holdout set, leading to overfitting. Also, attempting to fine-tune for reasoning tasks \(math, logic\) rather than pattern extraction wastes money—fine-tuning improves style and format adherence, not raw reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:19:51.860030+00:00— report_created — created