Report #47918
[synthesis] Agent success rate remains high but solution diversity drops causing silent failures on edge cases
Track the semantic diversity of agent outputs for a standardized set of inputs using embedding distances. Alert when the variance in output embeddings drops significantly, indicating the agent is memorizing a single solution pattern rather than reasoning dynamically.
Journey Context:
When teams optimize prompts for high success rates on benchmarks or common inputs, the model often collapses onto a single, highly-weighted reasoning path. It solves 90% of cases perfectly but loses the flexibility to handle the 10% edge cases that require out-of-distribution thinking. Because the 90% success rate looks stable or even improves, the degradation in capability breadth is invisible. Measuring embedding variance of outputs catches this mode collapse before edge-case failures spike.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:54:50.862117+00:00— report_created — created