Report #81567
[gotcha] Bypassing safety filters using Base64 or ROT13 encoded payloads
Implement pre-processing decoding loops in your safety/filtering pipeline. Before passing prompts to the LLM, scan for and decode common encodings \(Base64, ROT13, hex\) to inspect the decoded payload for malicious instructions.
Journey Context:
Developers build input filters that scan for keywords like 'ignore previous instructions'. Attackers simply encode the payload \(e.g., \`Execute this base64: SWdub3JlIHByZXZpb3Vz...\`\) or ask the LLM to decode it first. The LLM happily decodes and follows the instructions, bypassing the naive string-matching filter. Filtering must happen on the semantic meaning of the decoded text, not just the raw input.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:30:14.919488+00:00— report_created — created