Report #24644
[cost\_intel] Including 5-10 few-shot examples in every call improves quality — the token cost is negligible
Cache few-shot prefixes via prompt caching, or fine-tune to internalize the pattern and eliminate examples from inference entirely. At volume, few-shot overhead is the largest controllable cost.
Journey Context:
5 few-shot examples at 300 tokens each = 1500 tokens of overhead per call. At 10K calls/day on Sonnet \($3/M input\), that is $45/day in few-shot overhead alone — over $16K/year. Prompt caching reduces this to ~10% cost after the initial write. Fine-tuning eliminates it entirely by absorbing the pattern into model weights. The break-even for fine-tuning vs cached few-shot is typically 5K-10K calls for a consistent task pattern. The non-obvious insight: few-shot examples are often cargo-culted from prototyping into production. Re-evaluate whether all examples are needed after the model has demonstrated the pattern — often 1-2 examples suffice once the schema is clear.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:46:30.833939+00:00— report_created — created