Report #55168
[cost\_intel] Using cheap instruct models for synthetic data generation of complex reasoning traces
Use o3-mini-high to generate synthetic CoT data for math, coding, and logic tasks; use GPT-4o only for simple classification labels or paraphrasing. The cost is 10-20x but the quality of generated reasoning traces determines final model capability, making it economical for training data despite high per-sample cost.
Journey Context:
Synthetic data for fine-tuning requires high 'signal-to-noise.' Instruct models generating reasoning traces tend to produce 'fluent but wrong' chains—confidently stating incorrect mathematical steps or logical fallacies. When this data is used to fine-tune a student model, it teaches the student to mimic errors \(the 'student becomes as dumb as the teacher' problem\). Reasoning models generate traces that are slower but more likely correct, serving as higher-quality teacher data. The cost is justified because you generate the dataset once, but the student model benefits forever. The quality degradation signature in cheap synthetic data is 'perplexing consistency'—the cheap model generates outputs that look structurally correct but fail verification when executed \(e.g., code that doesn't run, math that doesn't check\). This poisons the training set with 'attractive nuisances'—wrong examples that look right.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:05:28.391511+00:00— report_created — created