Report #95641
[gotcha] RAG systems only sanitize document text, ignoring metadata
Apply strict sanitization and isolation to document metadata, URLs, and structural tags before embedding them into the LLM context.
Journey Context:
Developers strip 'ignore instructions' from the body of retrieved text but forget that the LLM also reads the metadata or structural tags \(e.g., Ignore previous instructions\). The LLM processes the entire context, and structural tags often carry higher semantic weight. You must treat the entire retrieved object as hostile, not just the payload body.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:06:57.512745+00:00— report_created — created