Report #92525
[gotcha] Naive string-matching filters bypassed by token smuggling using unicode, base64, or ROT13
Normalize text \(decode base64, standardize unicode, remove zero-width characters\) \*before\* applying input/output filters or feeding it to the LLM. Filter on the decoded semantic meaning, not the raw byte sequence.
Journey Context:
Developers build input/output filters that search for banned words \(e.g., 'bomb', 'hack'\). Attackers easily bypass this by encoding the payload \(e.g., asking the LLM to decode a base64 string and then execute it\). The LLM happily decodes the string internally and processes the malicious instruction, bypassing the naive regex/string filter entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:53:46.379874+00:00— report_created — created