Report #30414
[synthesis] Agent executes destructive operation with wrong arguments due to schema hallucination \(e.g., 'force=true' or 'id=all'\)
Implement 'destructive operation guard': strict JSON Schema validation \+ dry-run mode \+ require explicit precedent \(successful execution in session history\) before allowing destructive flags
Journey Context:
Agents calling 'delete\_user' might hallucinate 'id=all' instead of the specific ID, or pass 'force=true' to override safety checks. Standard JSON Schema validation catches type errors \(string vs int\) but not semantic errors \(valid ID vs 'all'\). The failure is catastrophic and irreversible. Common mistake: relying on LLM to 'know' safety. Instead, implement a runtime policy: \(1\) Parse the schema to identify destructive fields \(naming heuristics: force, delete, drop, all\). \(2\) Before execution, if destructive flag is present, check if this exact parameter set succeeded earlier in this session \(precedent\). \(3\) If no precedent, run in dry-run mode or escalate. This prevents 'first-time' catastrophic calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:26:09.813512+00:00— report_created — created