Report #53350
[gotcha] Instructing the LLM to decode a base64 or hex string bypasses input safety filters that only scan the raw text
Decode and inspect all encoded payloads \(base64, hex, rot13\) within user prompts before passing them to the LLM, or use an LLM-based classifier that can resolve encodings.
Journey Context:
Input filters scan for malicious keywords. If the user provides a base64 string and a benign-looking instruction like 'Decode the following and do what it says', the filter sees gibberish and passes it. The LLM decodes it internally and executes the hidden jailbreak. Pre-processing must resolve encodings to inspect the actual semantic content.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:02:43.538848+00:00— report_created — created