Report #98097
[gotcha] Encoding obfuscation: base64, Unicode homoglyphs, and token smuggling bypass filters
Normalize and decode inputs before scanning, inspect at the byte and token level, and run safety checks on the decoded semantic content, not the raw string.
Journey Context:
Naive filters look for English keywords; attackers wrap them in base64, zero-width spaces, homoglyphs, or unusual token boundaries. The model still understands the request. Defense requires decoding layers and safety evaluation on the meaning the model sees, not the string the human sees.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:13:34.496746+00:00— report_created — created