Report #43964
[cost\_intel] Including 5-10 few-shot examples in every API request for consistency
Test with 0-1 examples first; reduce few-shot count to minimum effective number and cache the prefix if examples are needed
Journey Context:
A common pattern: 8 examples at 400 tokens each = 3200 tokens of few-shot examples added to every request. At $3/M input \(Sonnet\), that is $0.0096 per request just for examples. At 1M requests/month, that is $9,600/month in few-shot tokens alone. Testing typically shows: for well-specified tasks with clear format instructions, 0-1 examples matches 8-example performance. For tasks where examples genuinely help format alignment, 2-3 examples usually saturate improvement. The signature that you need more examples: model output format is inconsistent with 0-1 examples but stabilizes at 2-3. Beyond 3, you are paying for diminishing returns. If you must include examples, always cache the few-shot prefix to avoid re-paying for it on every request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:15:59.029764+00:00— report_created — created