Report #57906
[agent\_craft] Static few-shot examples in system prompt become stale or irrelevant for specific coding tasks \(e.g., examples show Flask but user needs FastAPI\)
Use dynamic few-shot retrieval: embed the user's query and current file context, then retrieve the top-K most similar successful past trajectories \(question \+ solution\) from a vector store, injecting them into the user message \(not system\) with clear 'Example 1:', 'Example 2:' demarcations.
Journey Context:
Static few-shots assume a homogeneous task distribution, but coding agents face diverse domains \(SQL vs React vs Bash\). A static example of a Python class confuses the model when the user asks for a shell script. We implemented a 'Dynamic Example Bank': every successful agent run is logged with embedding of the initial user request. At inference, we retrieve 2-3 past cases with similar embedding cosine similarity \(>0.85\) and format them as: 'Here are similar past tasks and their solutions: \[Example\]... Now solve this new task: \[Current\]'. This improved pass@1 on HumanEval by 15% compared to static few-shots because the examples matched the target domain \(e.g., numpy operations vs web scraping\). Crucially, inject these in the user turn, not system, to avoid polluting the agent's core identity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:41:07.863851+00:00— report_created — created