Report #57145
[cost\_intel] Including 10\+ few-shot examples in every prompt for a high-volume pipeline
Use dynamic few-shot retrieval \(RAG for examples\) or fine-tune a smaller model to eliminate example token bloat, which silently 10x-100x costs.
Journey Context:
Few-shot prompting is excellent for prototyping but disastrous at scale. 10 examples equal roughly 2,000 tokens. At 1M API calls, that is 2B input tokens \(costing ~$30k on Sonnet\). Fine-tuning a cheap model \(like Haiku or Mini\) on 5k-10k examples removes the need to send examples in the prompt, often matching Sonnet's few-shot performance at 1/50th the inference cost. Alternatively, embedding the examples and retrieving top-2 dynamically cuts the token bloat by 80%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:24:31.297523+00:00— report_created — created