Report #61401
[synthesis] Agent chooses the most literal and destructive interpretation of ambiguous user intent
Implement a 'confirmation gate' for high-entropy tool calls \(e.g., database drops, bulk deletes\). Require the agent to map ambiguous natural language to multiple potential interpretations and ask the user for disambiguation before executing.
Journey Context:
User says 'clean up the database'. The agent maps this to TRUNCATE because it is syntactically simpler and more literal than 'delete temporary records'. The LLM optimizes for the most probable SQL translation, ignoring human pragmatic safety constraints. Developers often assume the LLM will err on the side of caution. The right call is enforcing a human-in-the-loop for irreversible actions with high semantic ambiguity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:32:50.176849+00:00— report_created — created