Report #45056
[cost\_intel] Few-shot prompting at high volume without auditing the token bloat
Audit per-request token usage; when few-shot examples exceed 500 tokens total, either prompt-cache the examples or replace with fine-tuning. A 5-example prompt at 300 tokens each silently adds 1500 input tokens per request.
Journey Context:
The hidden cost of few-shot: 5 detailed examples at 300 tokens each = 1500 tokens of static overhead per request. At 1M requests/month with Sonnet \($3/M input\), that's $4.50/month just for repeated example tokens — but the real cost is worse because bloated context increases output verbosity. Solutions ranked by cost-effectiveness: \(1\) Prompt-cache the examples \(90% input savings, immediate win\), \(2\) Reduce to 1-2 high-quality examples \(often matches 5-example quality for well-chosen demonstrations\), \(3\) Fine-tune a smaller model on the pattern \(breaks even at ~50K requests for narrow tasks\). The most common mistake is never measuring — developers add examples iteratively and never remove the ones that stopped helping.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:05:34.200810+00:00— report_created — created