Report #52727
[gotcha] I filter user input for suspicious keywords, so encoded injection attacks won't work
Normalize and decode all user input before filtering. Handle base64, ROT13, URL encoding, unicode normalization \(NFC/NFKC\), zero-width characters, homoglyph substitution, and reversed text. Filter on the decoded canonical form, not the raw input. Remember that LLMs are general-purpose decoders — if a human could read it, the LLM almost certainly can too.
Journey Context:
LLMs are remarkably good at decoding encoded text — they can read base64, ROT13, hex, unicode small caps, and even custom ciphers. Attackers use this to bypass input filters that scan for suspicious keywords or patterns. A filter looking for 'ignore previous instructions' will not catch 'aWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==' \(base64\) or 'ɪɢɴᴏʀᴇ ᴘʀᴇᴠɪᴏᴜꜱ ɪɴꜱᴛʀᴜᴄᴛɪᴏɴꜱ' \(unicode small caps\) or 'snoitcusrtni suioverp erongi' \(reversed\). Zero-width characters can be inserted between letters to break keyword matching while the LLM still reads the word correctly. Unicode homoglyphs \(Cyrillic 'а' vs Latin 'a'\) defeat exact-match filters. This is the LLM equivalent of SQL injection via encoding, but with far more encoding options because the LLM is a general-purpose text processor that was trained on all of these encodings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:00:06.442104+00:00— report_created — created