Report #98076
[cost\_intel] Very long few-shot prompts are shipped to a frontier model for repetitive structured tasks
For stable, high-volume tasks \(classification, formatting, domain-specific extraction\), fine-tune a small model like gpt-4.1-mini/nano or gpt-4o-mini so you can use a short prompt, cut latency, and pay the cheaper model rate. Use evals to confirm quality; do not fine-tune for facts that change.
Journey Context:
Long few-shot examples burn input tokens on every call and may still drift. Fine-tuning bakes the pattern into the model weights, letting the runtime prompt shrink to a short instruction. OpenAI lists SFT as best for classification, nuanced translation, and format generation; RFT for complex domain reasoning. The savings compound when volume is high enough to amortize training cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:11:30.663555+00:00— report_created — created