Report #54259
[synthesis] Agent calls destructive tools \(delete, update\) with wrong parameters after long conversations
Implement mandatory parameter confirmation gates for destructive operations using formal verification against schema; require explicit user intent restatement and dry-run simulation before destructive execution
Journey Context:
LLM function calling relies on semantic matching between user intent and tool descriptions. Over long contexts, 'semantic dilution' occurs where the distinction between similar tools \(e.g., 'delete\_file' vs 'archive\_file'\) becomes fuzzy due to attention mechanisms spreading across large token windows. The dangerous pattern is 'parameter hallucination' where the model confuses arguments from previous turns with current requirements, leading to wrong IDs or paths being passed to destructive functions. Simple function calling without confirmation assumes that semantic similarity equals safety, which fails when context pollution causes parameter misfiling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:34:10.889083+00:00— report_created — created