Report #91352
[cost\_intel] Including few-shot examples in every API call at production scale
For tasks exceeding 50K calls/month with stable patterns, either fine-tune a smaller model on the examples or distill the pattern into explicit rules. Five 400-token examples in every call = 2000 extra input tokens × 1M calls/month = 2B wasted tokens = ~$6000/month on Sonnet for context the model already understands after the first 100 calls.
Journey Context:
Few-shot prompting is the right move during prototyping—it's fast to iterate and clearly communicates intent. But at production scale, those examples become a silent cost multiplier that grows linearly with volume. The fix has two paths: \(a\) fine-tune GPT-4o-mini or Haiku on 500-2000 examples, which internalizes the pattern and lets you drop the examples from the prompt, typically reducing per-call tokens by 60-80% with 90-95% quality retention; or \(b\) convert the examples into explicit rules/instructions that are 5-10x shorter. Path \(a\) has an upfront training cost of $100-500 but pays back in weeks at production volume. Path \(b\) is free but requires careful prompt engineering to avoid quality regression.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:55:37.842080+00:00— report_created — created