Report #72580
[cost\_intel] Padding prompts with many few-shot examples for tasks that are already well-specified
For tasks with clear instructions \(classification, structured extraction, formatting\), use 0-1 examples with detailed instructions instead of 5-10 examples. This typically reduces input tokens by 5-10x with <3% quality difference. Reserve multi-shot for tasks where the output format is hard to describe but easy to demonstrate.
Journey Context:
Few-shot examples are a crutch for underspecified prompts. Engineers add examples reactively to fix edge cases, but those edge cases are better handled by refining the system prompt instructions. The token bloat pattern: system prompt is 500 tokens, examples are 3,000 tokens, and removing 8 of 10 examples changes quality by <2%. At high volume, you're paying 6x more per call for negligible gain. The diagnostic: if you can describe the pattern in words, do that. If the pattern is genuinely tacit \(vibes, style, voice\), then examples are justified. For classification and extraction, zero-shot with schema constraints almost always matches few-shot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:24:59.039312+00:00— report_created — created