Report #91503
[cost\_intel] Few-shot prompting silently 10x costs on small models
Calculate the token cost of few-shot examples against the base model cost. Often, sending 5 long examples to Haiku/Flash costs more in input tokens than sending 0 examples to Sonnet/GPT-4o, with worse quality.
Journey Context:
To get small models to perform, developers stuff the prompt with examples. Input token costs scale linearly. A 4k token few-shot prefix on Haiku \($0.25/MTok\) costs $0.001 per call, while a 0-shot call to Sonnet \($3/MTok\) with a 500 token prompt costs $0.0015. You save fractions of a cent but lose quality and increase latency. Zero-shot frontier models often beat few-shot budget models on both cost and quality for complex tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:10:43.365309+00:00— report_created — created