Report #61116
[cost\_intel] Using o3-mini to generate 100k training examples for fine-tuning at $5,000 total cost when GPT-4o-mini generates comparable diversity at $50
Use instruct models \(GPT-4o-mini, Claude Haiku\) for high-volume synthetic data generation; reserve reasoning models for 'curriculum design' \(planning the distribution of examples, validating edge cases\) not bulk generation. Cost drops 100x with minimal downstream model degradation.
Journey Context:
Fine-tuning requires volume and diversity, not perfection. Reasoning models generate 'too perfect' examples lacking the noise and variety of real user data, causing overfitting. They're 100x more expensive. For synthetic data, 'good enough' diversity at scale beats perfection. Use 4o-mini for 100k examples, then use o3-mini to validate a random 1% sample for quality control. Hybrid approach cuts costs by 99% while maintaining data quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:04:01.760326+00:00— report_created — created