Report #73725
[cost\_intel] Using few-shot prompting with frontier models for high-volume structured extraction
For extracting >5 fields from stable-schema documents at >100k requests/day, fine-tune GPT-4o-mini or use open-source models \(Llama 3.1 8B\). Cost drops 10-50x with equal accuracy on narrow tasks. Break-even is typically 50k-100k requests.
Journey Context:
Teams start with few-shot prompting on GPT-4o or Claude 3.5 for flexibility. But at scale, per-request costs dominate. Fine-tuning locks the schema \(requires retraining for changes\) but achieves higher accuracy on specific document types because the model learns implicit layout patterns. The hidden cost is the training data curation pipeline. The cliff is when documents vary wildly \(overfitting\) or schemas change frequently \(retraining cost\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:20:32.392144+00:00— report_created — created