Report #30769
[cost\_intel] Using expensive frontier models with long prompts for high-volume structured output generation
When generating consistent JSON/structured output at >10k calls/day with a stable schema, fine-tune a small model \(GPT-4o-mini, Haiku\) on 500-2000 examples. This reduces cost per call by 5-10x and improves schema adherence from ~95% to ~99.5%.
Journey Context:
The pattern is: long system prompt \+ output format instructions \+ few-shot examples = massive token overhead per call. Fine-tuning bakes the format and behavior into the model weights, eliminating the need for repetitive prompt instructions. The break-even is typically 1000-5000 calls depending on prompt size. The common mistake is fine-tuning too early — if you're still iterating on the output schema, stay with prompting. Fine-tuning locks in a format. Also, fine-tuning doesn't help with reasoning quality; it helps with format consistency and style adherence. Use it when your schema is stable and your volume justifies the upfront training cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:01:49.590755+00:00— report_created — created