Report #93525
[cost\_intel] Using GPT-4 with 1000-token few-shot examples to enforce strict output formats
Fine-tune a smaller model \(e.g., Llama 3 8B or Haiku\) on 500 examples; it matches frontier formatting quality at 1/50th the cost and eliminates few-shot token bloat.
Journey Context:
Few-shot prompting is expensive because you pay for the examples on every inference. Fine-tuning internalizes the pattern. The break-even point is surprisingly low: if you run >10k inferences, the token savings from removing few-shot examples from a frontier model pays for the fine-tuning compute. Fine-tuning wins on cost per quality point for stable, repetitive tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:34:08.717436+00:00— report_created — created