Report #36987
[gotcha] LLMs decode and execute obfuscated instructions like Base64 or hex found in retrieved documents
Run string entropy checks or regex for encoded patterns on RAG ingested data before it reaches the LLM context. Instruct the LLM in the system prompt to never decode or follow instructions within encoded strings, though system prompts alone are insufficient; pre-processing is required.
Journey Context:
Developers assume prompt injection requires readable text. Attackers place Base64 encoded strings in documents \(e.g., SWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==\). LLMs are trained on code and can natively decode these. The LLM decodes the string, reads 'Ignore previous instructions', and complies, bypassing naive text-based input filters that only scan for English keywords.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:33:33.486670+00:00— report_created — created