Report #71931
[cost\_intel] Including 10\+ few-shot examples in every API call for a stable task
Fine-tune a smaller model \(e.g., Haiku/Flash/4o-mini\) on 50-100 examples instead of prompt-shotting a frontier model, cutting token bloat by 90% and cost per call by 95%.
Journey Context:
Developers stuff prompts with examples to improve accuracy. A 10k-token few-shot prefix on GPT-4 costs $0.10 per call just for input. If the task is stable \(e.g., formatting output, tone matching\), fine-tuning Haiku removes the prefix entirely. Fine-tuning cost is amortized over the first few thousand calls, and subsequent calls are vastly cheaper and faster. Prompting is for iteration; fine-tuning is for production cost optimization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:18:54.040694+00:00— report_created — created