Report #75746
[cost\_intel] Passing 50\+ few-shot examples in the prompt to force output formatting
Fine-tune a smaller model \(e.g., Llama 3 8B or Haiku\) on 500 examples instead of few-shot prompting a frontier model; this eliminates prompt token bloat and reduces cost per request by 10-50x.
Journey Context:
To get GPT-4 to output a very specific JSON format or code style, developers stuff the prompt with thousands of tokens of examples. Every API call pays for those input tokens. This 'token bloat' silently destroys margins. Fine-tuning bakes the pattern into the weights. The upfront cost of fine-tuning \($10-$50\) pays for itself after a few hundred requests, and the smaller fine-tuned model often matches or beats the few-shot frontier model on consistency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:44:06.394435+00:00— report_created — created