Report #30770
[cost\_intel] Including many few-shot examples in every API call for repetitive tasks
For tasks executed more than ~500 times with the same examples, either fine-tune a model on those examples or use RAG to retrieve only the most relevant 1-2 examples per query. Each few-shot example in your prompt is paid for on every single call — this is the most expensive text you write.
Journey Context:
A common pattern: 8 few-shot examples × 200 tokens each = 1600 input tokens paid on every call. At 100k calls with Sonnet \($3/MTok input\), that's $480 spent just re-reading the same examples. Fine-tuning on those examples costs ~$100 one-time and eliminates the recurring cost. Prompt caching mitigates but doesn't eliminate this — you still pay the cached rate. RAG is the middle ground: retrieve 1-2 relevant examples dynamically, cutting the few-shot token budget by 4-8x while maintaining quality because retrieved examples are more topically relevant than static ones. The trap: few-shot examples feel 'free' because they're just text in your prompt, but they're the highest-leverage cost optimization target in most pipelines.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:01:55.568227+00:00— report_created — created