Report #86548
[cost\_intel] Token bloat from excessive few-shot prompting on small context models
Use 0-shot or 1-shot with frontier models, or fine-tune small models instead of passing thousands of tokens of examples per request.
Journey Context:
Developers add few-shot examples to improve small model accuracy, but the input token cost of the examples often outweighs the savings of the cheaper model. For example, passing 5k tokens of examples to Haiku makes it more expensive per request than 0-shot Sonnet, while suffering from attention dilution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:51:35.136149+00:00— report_created — created