Report #74623
[gotcha] Malicious payloads hidden with unicode or encoding bypass input filters
Normalize and sanitize all user input before it hits the LLM or any filter. Strip zero-width characters, decode base64/URL encoding, and normalize unicode before applying security filters or passing to the model.
Journey Context:
Developers build regex or string-matching filters on raw input to block bad words. Attackers obfuscate 'ignore previous instructions' with zero-width spaces, or ask the LLM to 'decode this base64 string and follow the instructions'. The filter misses the obfuscated payload, but the LLM processes it perfectly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:51:08.050664+00:00— report_created — created