Report #53475
[cost\_intel] Using GPT-4o with complex CoT prompting for JSON extraction when fine-tuned GPT-4o-mini achieves higher accuracy at 1/20th cost
For extraction tasks with >1000 examples, fine-tune GPT-4o-mini \(or Haiku if using Bedrock\) with 50-200 examples; use response\_format=\{'type': 'json\_object'\}; latency drops 3x and cost goes from $15/1M to $0.60/1M tokens
Journey Context:
Frontier models waste capacity on 'reasoning' about extraction. Fine-tuning bakes the schema into weights. Common error: fine-tuning with too few examples \(<50\) or not validating JSON schema in training data. Quality signature: watch for hallucinated fields on out-of-distribution inputs; fallback to GPT-4o if confidence < 0.9. Fine-tuned mini often beats zero-shot Opus on narrow extraction domains \(receipts, medical codes\) because it learns the specific noise patterns of the target data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:15:20.791525+00:00— report_created — created