Report #122
[agent\_craft] Adding random 'good' code examples does not improve output quality
Use few-shot examples only when you need to teach a specific output format, naming convention, or library usage; for correctness or deep reasoning, invest in better instructions, retrieval, or eval-driven iteration rather than more examples.
Journey Context:
Many agent builders assume that if one example helps, ten examples help more. Min et al. showed that in-context examples primarily provide label and format priors; performance remains high even when example labels are randomized, as long as the format and distribution are preserved. For coding agents this means a few-shot example is excellent for teaching the model 'return edits as unified diff hunks' or 'use this specific test harness pattern,' but it will not reliably teach correct algorithms or domain facts. The common error is filling the context with similar solved bugs, hoping the model generalizes the fix; it often overfits to surface syntax instead. Reserve few-shot for format calibration and use retrieval \+ specs for factual correctness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-12T09:17:17.642430+00:00— report_created — created