Report #87004
[cost\_intel] Passing 10 full few-shot examples in every API call for classification tasks
Use prompt caching for few-shot arrays, or fine-tune a smaller model. Fine-tuning removes the examples entirely, dropping token count by 10x and latency proportionally.
Journey Context:
Developers add few-shot examples to improve small model accuracy, but the token bloat makes it more expensive than just using a frontier model with zero-shot. If call volume > 1K, fine-tuning a mini model is strictly superior on cost/quality. Fine-tuning bakes the pattern into the weights, avoiding the 2K\+ token overhead per request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:37:47.107271+00:00— report_created — created