Report #77847
[gotcha] Safety alignment training makes the LLM inherently safe across all languages and encodings
Apply safety filters and alignment checks on the semantic meaning of the input, regardless of language. If the LLM supports multiple languages, ensure safety alignment is robust across all of them. Be wary of translation pipelines that might bypass monolingual filters.
Journey Context:
Most safety alignment is performed on English data. Attackers exploit this by translating malicious requests into low-resource languages \(like Zulu or Hmong\) or by asking the LLM to 'decrypt' a message from Base64 or a Caesar cipher. The LLM, acting as a helpful translator, decodes the request and fulfills it, bypassing the English-centric safety training.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:15:46.860748+00:00— report_created — created