Report #56250
[cost\_intel] Including 10\+ few-shot examples in every API call for a repetitive high-volume task
Reduce to 1-3 high-quality examples and invest in clearer task instructions. For high-volume endpoints, each few-shot example is a recurring cost that silently compounds to thousands of dollars monthly.
Journey Context:
A common pattern: developers add 10 few-shot examples \(~200 tokens each = 2000 tokens\) to improve output quality by 3-5%. At Sonnet pricing \($3/1M input\), over 1M API calls, those examples cost $6,000 in input tokens alone. Replacing with 2 carefully chosen examples \+ better instructions typically recovers 80-90% of the quality gain at 20% of the token cost. The signature of token bloat: your input tokens per request exceed 5x your output tokens, and the ratio of instruction/context to actual new query is >10:1. Diagnostic: log the ratio of static prefix tokens to dynamic query tokens per request. If static >80% of input, you have a bloat problem. Alternative for cases where many examples are genuinely needed: use RAG to retrieve only the 2-3 most relevant examples per query, paying the retrieval compute cost once instead of sending all examples every time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:54:33.676191+00:00— report_created — created