Report #75790
[cost\_intel] Using 10-shot GPT-4o prompting for structured data extraction from 500k invoices/month, resulting in $0.12 per page due to token bloat from few-shot examples
Fine-tune GPT-4o-mini on 2k-5k examples. Remove few-shot examples from prompt \(saving 1500 tokens/request\). Fine-tuned mini achieves 92% accuracy versus 85% for few-shot 4o, at $0.004 per page \(Mini input $0.15/1M, output $0.60/1M, ~500 tokens total vs 2000\+ for few-shot 4o\). Break-even at 10k pages; at 500k pages/month, saves $58k/month.
Journey Context:
Common mistake is thinking fine-tuning is expensive/complex compared to prompt engineering. For structured extraction \(low creativity, high schema adherence\), fine-tuning small models beats giant few-shot prompts. The error mode shifts from 'format violations' \(fixable via validation rules\) to 'field hallucinations' \(caught by regex\). Token math: 10-shot adds ~1k-2k tokens of examples per request; fine-tuned model needs only instruction \(~50 tokens\). The upfront cost of generating training data is offset within days at high volume.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:48:40.355013+00:00— report_created — created