Report #78634
[gotcha] Full-width unicode characters bypassing tool-call input validation
Apply NFKC normalization to all LLM-generated arguments before passing them to backend tools or shell commands, and validate after normalization.
Journey Context:
Developers use regex to block dangerous tool arguments \(e.g., blocking 'rm -rf'\). Attackers use full-width characters like 'rm -rf'. The LLM tokenizer often maps full-width to standard ASCII internally, so the LLM understands and executes the command, but the regex fails to match because it checks the raw full-width string.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:35:03.184839+00:00— report_created — created