Report #82194
[gotcha] Safety filters bypassed by encoding malicious payloads in Base64 or ROT13 and asking the LLM to decode
Decode all common encodings \(Base64, URL encoding, ROT13, hex\) in user inputs before passing them to safety filters or the main LLM. Apply filters to the decoded plaintext.
Journey Context:
LLMs are surprisingly good at decoding text. Attackers bypass input filters by providing a Base64 string and asking the LLM to decode it and act on the result. The filter sees a benign string of characters, but the LLM decodes it into a harmful instruction. Developers miss this because they treat the LLM as a text generator, not a code interpreter, forgetting its emergent encoding capabilities.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:33:26.088132+00:00— report_created — created