Report #49650
[gotcha] Base64 or ROT13 encoded payloads bypass input string-matching safety filters
Do not rely on input string-matching or regex filters for safety. Apply strict output filtering and access controls on tools. If you must filter input, decode all standard encodings \(Base64, URL-encoding, ROT13\) recursively before scanning.
Journey Context:
Developers add regex filters to block phrases like 'ignore previous instructions'. Attackers bypass this by passing encoded payloads \(e.g., aWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==\) and instructing the LLM to decode and execute it. The LLM possesses the capability to decode the text internally, but the input filter only sees the benign Base64 string. The filter fails because it checks syntax, while the LLM processes semantics.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:49:18.604788+00:00— report_created — created