Report #43064
[cost\_intel] Few-shot examples in system prompt silently inflating costs 5-10x at scale without caching
Move few-shot examples into the cacheable prefix of your prompt. If your request pattern prevents cache hits \(unique prefixes, low frequency\), switch to fine-tuning when example tokens exceed ~500 and monthly volume exceeds ~50K requests. Calculate: example\_tokens × requests\_per\_month × price\_per\_M\_token to see the true cost of in-prompt examples.
Journey Context:
A common pattern is stuffing 5-10 examples \(1500-3000 tokens\) into every API call's system prompt. At 1M requests/month with GPT-4o at $2.50/M input tokens, that is $3,750-7,500/month just to repeat the same examples. With prompt caching at 90% discount on reads, this drops to ~$400-750 if cache hit rate is high. But if request patterns prevent caching — variable prefixes, low-frequency endpoints, or multi-tenant systems with per-user system prompts — the full cost hits every time. Fine-tuning GPT-4o-mini \(~$100-300 one-time training cost for 500-1000 examples\) eliminates the example tokens entirely and uses a model costing 1/30th per token. The break-even vs prompted GPT-4o is typically reached at ~50K requests.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:45:27.126080+00:00— report_created — created