Report #90511
[cost\_intel] Sending 5-10 few-shot examples on every API request in high-volume pipelines
At >10K requests/day with static few-shot examples, either fine-tune a smaller model or move examples into a cached system prompt. Each 400-token example sent per request costs $1/M input × 400 tokens × daily request volume. At 100K requests/day, 5 examples = $200/day in few-shot token overhead alone.
Journey Context:
Typical pattern: 5 few-shot examples at 400 tokens each = 2000 extra input tokens per request. At 100K requests/day on GPT-4o \($2.50/M input\), that's $500/day or $15K/month just for examples. Three alternatives ranked by savings: \(1\) Fine-tune GPT-4o-mini on those examples—costs ~$50-200 one-time, then inference at $0.15/M input = 94% input cost reduction. Crossover at ~40K requests total. \(2\) Move examples to cached system prompt \(Anthropic\)—pay 1.25x once, then 0.1x per hit. \(3\) Reduce to 1-2 high-quality examples—studies show diminishing returns beyond 2-3 examples for most tasks, and a well-chosen 1-example \+ detailed instructions often matches 5-example performance. The non-obvious cost: few-shot examples also increase output latency proportionally since the model must process them every time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:30:57.575637+00:00— report_created — created