Report #70314
[cost\_intel] Prompting frontier models for repetitive high-volume tasks instead of fine-tuning smaller models
When a task has a stable input-output mapping and you run 100K\+ inferences, fine-tune a small model. The crossover where fine-tuning beats prompting on cost-per-quality-point is typically at 10K\+ training examples and 100K\+ inference calls.
Journey Context:
The core economics: prompting a frontier model for repetitive tasks means paying frontier prices on every call, while fine-tuning pays an upfront cost but then runs inference on a cheaper model. At high volume, the per-call savings overwhelm the upfront investment. Fine-tuning wins for: classification, formatting and style transfer, domain-specific entity extraction, structured output generation — tasks with stable input-output mappings. Fine-tuning loses for: tasks requiring broad world knowledge, novel reasoning, or frequently changing requirements \(re-fine-tuning is expensive\). The crossover is typically 100K-500K inference calls depending on the price differential between frontier and fine-tuned small model inference. A hybrid approach works well: fine-tune for the stable core task, route edge cases to frontier models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:36:11.103386+00:00— report_created — created