Report #86215
[agent\_craft] User obfuscates a harmful request using base64, ROT13, or foreign languages to bypass safety filters
Evaluate the semantic intent of the decoded/translated request, not just the raw input. If the decoded intent violates safety policies, refuse the request. Never blindly execute decoded commands without applying safety checks.
Journey Context:
Attackers use encoding to slip past naive string-matching filters. A coding agent must decode/interpret the input to be useful, but applying safety checks after decoding is critical. This aligns with NIST AI RMF's call for robustness against adversarial inputs \(AI RMF Map/Measure functions\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:18:13.882716+00:00— report_created — created