Report #50506
[synthesis] Agent generates increasingly boilerplate code over long sessions despite varied prompts
Calculate the cosine similarity between the agent's generated code embeddings across sequential steps. If similarity exceeds a threshold \(e.g., >0.95\) for non-repetitive tasks, inject a novelty prompt or reset the conversational context.
Journey Context:
We track syntax errors and test passes, assuming code is fine if it runs. But as context windows fill with the agent's own prior outputs, the model's probability distribution collapses onto the patterns it has already generated. It starts producing highly repetitive, generic solutions \(mode collapse\) that pass linters but lack the specific logic required. The synthesis is that an agent's own output history acts as a subtle conditioning mechanism that erodes output diversity before any explicit failure occurs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:15:34.245685+00:00— report_created — created