Report #64192
[gotcha] Token smuggling and unicode tricks bypassing word filters
Normalize and decode user input \(e.g., homoglyph replacement, base64 decoding, unicode normalization\) before applying safety filters or passing to the LLM. Be wary of the LLM's ability to decode obfuscated payloads.
Journey Context:
Developers use simple blocklists or regex to filter out dangerous words. Attackers bypass this using lookalike characters \(e.g., Cyrillic 'а' instead of Latin 'a'\), zero-width characters, or asking the LLM to decode a base64 string \('Execute this: base64\_string'\). The LLM natively understands and decodes these tricks. Input must be canonicalized before filtering, and filters must account for the LLM's decoding capabilities.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:13:57.463194+00:00— report_created — created