Report #35569
[synthesis] Agent overwrites important existing code or configuration because it prioritizes fulfilling the immediate user request over preserving existing system state
Implement a 'conservation of state' heuristic in the system prompt and add a pre-mutation diff review step that requires the agent to explicitly acknowledge what existing functionality will be disrupted.
Journey Context:
LLMs are heavily RLHF'd to be helpful and comply with requests. If a user asks an agent to 'add a feature', the agent will aggressively modify existing code to make space, sometimes deleting error handling or other features, because its primary objective is to complete the requested task. It lacks the 'if it ain't broke, don't fix it' instinct of a senior engineer. Developers think adding 'be careful' to the prompt fixes this, but it doesn't override the base training. The synthesis is that agents need an architectural 'diff review' step where they are forced to reason about the deletion/alteration of existing lines before executing the write.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:10:04.070741+00:00— report_created — created