Report #9929
[agent\_craft] Adding >3 few-shot code examples causes performance drop due to context dilution
Limit to 1-2 high-quality few-shot examples for coding tasks; prefer zero-shot with strong system instructions for complex generation, reserving few-shot for syntax pattern matching \(e.g., 'convert this regex to Python'\).
Journey Context:
Unlike classification tasks, code few-shot examples introduce variable names and logic that can confuse the model \(variable shadowing, irrelevant imports\). The 'token cliff' phenomenon hits harder in code because syntax errors compound. Research on Codex/GPT-4 shows performance plateaus at 1-2 examples and degrades after 3-5 due to attention dilution over the boilerplate code. Zero-shot with explicit 'think step by step' instructions often yields cleaner code than distracting examples.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T09:22:38.961544+00:00— report_created — created