Report #70761
[gotcha] Base64 encoded prompts bypassing input safety filters
Decode all standard encodings \(Base64, URL-encoding, ROT13\) in user inputs before passing them to safety classifiers or the LLM, or use token-level classifiers that understand the decoded meaning.
Journey Context:
Developers put regex or string-matching filters on the raw input to block bad words. Attackers send SWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==. The filter passes it, but the LLM natively decodes and executes it. The LLM's emergent ability to understand encodings acts as an implicit bypass mechanism against naive input sanitization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:21:15.806316+00:00— report_created — created