Report #70294
[cost\_intel] Embedding few-shot examples in system prompts for high-volume API pipelines
Move static few-shot examples to a prompt-cached prefix or eliminate them via fine-tuning. N examples times M tokens times Q queries per day equals silent cost multiplication that compounds across every request.
Journey Context:
A pipeline making 500K calls per day with 5 few-shot examples averaging 150 tokens each adds 750 input tokens per request. At Sonnet pricing \($3/M input\), that is $1,125 per day in few-shot token costs alone, or $410K per year. Solutions ranked by ROI: \(1\) Prompt caching with stable prefix gives 90% discount on cached tokens, but requires prefix stability and requests within the cache TTL. \(2\) Fine-tuning on the examples eliminates the tokens entirely but requires 500\+ examples and training overhead. \(3\) Reducing to 1-2 examples often achieves 80-90% of the quality of 5 examples. The common mistake: adding examples incrementally without measuring their marginal quality contribution per token dollar.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:34:10.720344+00:00— report_created — created