Report #86529
[gotcha] Safety filters bypassed via base64 or hex encoded payloads
Decode and normalize all text inputs \(base64, URL encoding, hex, unicode\) before applying safety filters or prompt injection detection.
Journey Context:
Input filters often scan for malicious keywords in plain text. Attackers encode the payload \(e.g., \`SWdub3JlIHByZXZpb3Vz...\`\) and instruct the LLM to decode it. The LLM natively understands base64, decodes the hidden instruction, and executes it, completely bypassing the plain-text filter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:49:36.114909+00:00— report_created — created