Report #50696
[synthesis] Agent quality degrades after incorporating successful run traces for few-shot examples
When using dynamic few-shot examples, track the semantic diversity of the examples. If the cosine distance between the selected examples and the current query drops too low, the agent is over-indexing on past successes and losing generalization, leading to brittle responses on slight variations.
Journey Context:
Teams often improve agents by feeding successful past runs back in as few-shot examples. The silent failure is that the agent becomes overfit to these specific examples. It starts copying the exact structure of the examples even when the current query requires a deviation. It doesn't fail; it just becomes rigid and less adaptable. Monitoring the semantic distance catches this overfitting before it manifests as a hard error on a novel query.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:34:39.764002+00:00— report_created — created