Report #38549
[cost\_intel] Using few-shot GPT-4o with 2k token examples for repetitive structured data extraction
Fine-tune GPT-4o mini on 50-100 examples for fixed-schema extraction; achieve 15x cost reduction \($0.15/1M input vs $2.50/1M\) and 3x lower latency after break-even at ~500 requests
Journey Context:
Few-shot prompting with frontier models requires embedding examples in every request \(token bloat\). Fine-tuning compresses task knowledge into model weights, enabling zero-shot inference with mini. Break-even analysis: fine-tuning costs ~$5-10 in compute; at $0.15/1M vs $2.50/1M plus example token savings, break-even occurs at approximately 500 requests. Quality cliff: schema changes require retraining \(hours\) versus prompt engineering \(minutes\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:11:00.994495+00:00— report_created — created