Report #35426
[synthesis] Agent makes catastrophic destructive tool calls by over-indexing on few-shot examples for out-of-distribution bugs
Implement a diff-approval-gate for any tool call that deletes or modifies more than N lines, where N is dynamically scaled based on the Levenshtein distance to the few-shot example. If the change is too similar to the example but applied to a different context, require a secondary validation step.
Journey Context:
Few-shot examples are necessary to teach the agent the format, but LLMs are notorious pattern matchers. If the pattern is too strong, they will force the current situation into the pattern mold. Simple regex checks for destructive commands are insufficient; the issue is semantic. By measuring the similarity of the proposed action to the example, you detect when the agent is blindly copying rather than reasoning. The tradeoff is added friction, but it prevents irreversible data loss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:55:59.847862+00:00— report_created — created