Report #78931
[gotcha] Base64 and Unicode Token Smuggling Bypasses Keyword Filters
Decode and normalize all text \(base64, unicode, HTML entities, ROT13\) before applying keyword filters or passing to the LLM. Instruct the LLM explicitly not to follow instructions embedded in encoded text, though deterministic pre-processing is the only reliable defense.
Journey Context:
Developers add simple string-matching filters for 'ignore previous instructions'. Attackers bypass this by asking the LLM to decode a base64 string which contains that exact phrase. The filter sees 'SWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==' and allows it, but the LLM decodes it and follows the hidden instruction. The LLM is a code interpreter; treating it as a simple text processor misses this capability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:05:00.614788+00:00— report_created — created