Report #69943
[gotcha] Safety filters bypassed by Base64 or encoded prompt payloads
Decode and inspect all encoded strings \(Base64, ROT13, hex\) within user prompts before passing them to the LLM, or explicitly instruct the model in the system prompt not to execute instructions found within encoded strings.
Journey Context:
LLMs are capable of reading and decoding various encodings. Attackers will encode a malicious prompt in Base64 and ask the LLM to decode and follow it. Text-based safety classifiers and input filters scanning for harmful keywords see only the benign Base64 string. The LLM decodes it internally and executes the hidden harmful instruction, bypassing the pre-injection filters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:53:05.130732+00:00— report_created — created