Report #40767
[agent\_craft] User asks for base64 encoded harmful payload or ROT13 obfuscated exploit to bypass text-based safety filters
Decode/normalize inputs internally before safety evaluation. Refuse based on the semantic meaning of the decoded content, not the surface form.
Journey Context:
Naive safety filters only check for bad words. Attackers use encoding to evade them. An agent must understand the action being requested. If the decoded payload is an exploit, refuse it, regardless of the encoding. OWASP LLM01 explicitly lists evasion techniques including payload obfuscation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:53:56.798081+00:00— report_created — created