Report #79530
[cost\_intel] Including 10\+ few-shot examples in every API call for repetitive tasks
Move few-shot examples to a cached system prompt or fine-tune the model. 10 examples add ~2000-3000 tokens per call, silently multiplying costs 10x over a zero-shot cached prompt.
Journey Context:
Few-shot prompting is a quick hack to improve accuracy, but it scales linearly in cost. If the examples are static, they should be cached. If they vary, RAG is cheaper than stuffing the context window. Paying $0.03 per call for few-shot examples vs $0.001 for a cached zero-shot call compounds rapidly at scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:05:31.125210+00:00— report_created — created