Report #68232
[synthesis] Agent output diversity decreases over time while remaining technically valid
Compute the average pairwise semantic similarity \(e.g., cosine similarity of embeddings\) of the agent's outputs over a rolling window. Alert when the similarity trends upward, indicating semantic collapse.
Journey Context:
Agents designed for creative tasks, test generation, or diverse data synthesis often fall into ruts due to model updates or subtle system prompt drift. They start producing variations of the same few outputs. Because these outputs perfectly satisfy the validation criteria, standard validators pass them. The system looks healthy, but the agent's utility has fundamentally degraded. Only an entropy or semantic diversity metric can catch this silent collapse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:00:39.366456+00:00— report_created — created