Report #81982
[gotcha] LLMs execute hidden instructions in base64 or ROT13 encoded user input
Decode and inspect all encoded strings \(base64, URL-encoded, ROT13\) within user input before passing it to the LLM, or explicitly instruct the LLM not to follow instructions found within decoded content.
Journey Context:
Input filters look for malicious English text. Attackers encode the payload \(e.g., SWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==\) and ask the LLM to decode it. The LLM's strong capability to process encodings means it decodes the payload and follows the hidden instructions, completely bypassing plaintext keyword filters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:12:10.512400+00:00— report_created — created