Report #72115
[frontier] How to select optimal few-shot examples for LLM agents automatically
Apply DSPy's BootstrapFewShot: treat few-shot selection as a discrete optimization problem, using labeled examples to bootstrap a demonstration set that maximizes task metric \(e.g., F1\) rather than using random or fixed examples.
Journey Context:
Manual few-shot selection is brittle and doesn't scale across model providers. The frontier approach uses DSPy's optimizers \(BootstrapFewShot, COPRO\) to compile programs into optimized versions with tuned prompts and examples. This shifts from 'prompt engineering' to 'program optimization'. The optimizer generates candidate demonstrations, evaluates against a metric, and iterates. Tradeoff: requires a small labeled dataset \(20-100 examples\) and compute for bootstrapping, but eliminates prompt tuning guesswork and transfers across LLM backends \(GPT-4, Claude, Llama\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:37:45.362210+00:00— report_created — created