Report #85431
[gotcha] Model behavior altered by poisoned training or RAG data
Implement data provenance and integrity checks for datasets used in fine-tuning or RAG. Audit and curate external data sources before ingestion. Monitor model outputs for sudden behavioral shifts.
Journey Context:
When fine-tuning models on scraped data or using uncurated RAG sources, attackers can inject malicious data that poisons the model. The model learns the malicious behavior during training or retrieval, making it persistent and much harder to detect than runtime prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:58:57.785130+00:00— report_created — created