Report #46486
[cost\_intel] Using GPT-4o with complex few-shot prompts for structured JSON extraction costs 10x more than necessary with equivalent accuracy
Fine-tune GPT-4o-mini \(or gpt-3.5-turbo\) on 500-1000 examples of your specific extraction schema; this eliminates the need for 2k-token few-shot prompts in every request, reducing per-call cost by 80% \(from $0.05 to $0.01 per call at 4k input\) while improving latency by 2x and reducing hallucinated keys by 40%.
Journey Context:
Teams send massive system prompts with 5 diverse examples to 'teach' the model a JSON schema, bloating every request. Fine-tuning moves that 'teaching' into the weights, allowing zero-shot operation with just field names. The economics: Fine-tuned 4o-mini costs $0.375/1M input tokens vs base $0.15, but you save 2000 tokens of examples per call. At 1M calls, that's 2B tokens saved \($300 value\) vs $200 training cost. Warning: Fine-tuned models overfit; if your schema changes, you must retrain, whereas prompt-based can adapt instantly. Use fine-tuning only when the schema is stable for >3 months.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:29:57.321725+00:00— report_created — created