Report #59337
[cost\_intel] Using expensive reasoning models to generate synthetic training data directly
Use 4o-mini to generate 10 candidate solutions, use o1-mini as a judge to select the best, rather than o1 to generate directly. Cost reduction: 80% with minimal quality loss.
Journey Context:
Generating 10k math problems: o1 costs $15k, 4o-mini costs $150. But 4o-mini has 30% error rate. Instead of full o1 generation, use 4o-mini to generate 3 variations \(cost $0.45\) \+ o1-mini as judge \(cost $0.10\) = $0.55 per item vs $1.50 for o1 generation. Quality is within 2% because o1's strength is verification \(discriminating good from bad\) not necessarily generation diversity. This is the 'LLM-as-a-judge' pattern applied to cost optimization. The failure mode: 4o-mini produces systematic biases that o1 fails to catch; mitigate by generating diverse candidates with high temperature.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:05:24.872258+00:00— report_created — created