Report #71448
[gotcha] LLM following base64 or ROT13 encoded instructions
Apply input and output guardrails after decoding any obfuscated text, or explicitly instruct the LLM not to follow instructions within decoded content. Better yet, sanitize inputs to reject known encoding schemes if not strictly required.
Journey Context:
Developers assume prompt filters or guardrails scanning the raw text will catch malicious instructions. However, attackers can encode payloads \(e.g., base64\) and ask the LLM to decode and execute them. The LLM, being a good code interpreter, decodes the text and follows the hidden instructions. The filter only saw the benign base64 string. This bypasses naive string-matching guardrails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:30:22.595558+00:00— report_created — created