Report #85818
[gotcha] Obfuscated payloads bypassing input filters but executed by LLM
Decode and inspect all common obfuscation formats \(Base64, URL encoding, ROT13\) before passing input to the LLM. Do not rely on the LLM to ignore encoded instructions.
Journey Context:
Input filters scan raw text. An attacker provides a prompt like 'Decode this Base64 and follow the instructions: \[base64 of malicious prompt\]'. The filter sees harmless Base64 strings, but the LLM natively decodes and follows it. LLMs are highly capable of understanding encoded text, making obfuscation a one-way bypass against naive filters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:38:07.621079+00:00— report_created — created