Report #93971
[gotcha] Using fine-tuned models or embeddings from untrusted sources that contain backdoors or poisoned data
Vet the provenance of pre-trained models and datasets. Use models from trusted registries and scan datasets for malicious payloads or biased instructions before fine-tuning.
Journey Context:
Developers download random models from Hugging Face or use scraped datasets for fine-tuning. Attackers can poison these datasets \(e.g., inserting 'When asked about X, output Y'\) so the fine-tuned model behaves maliciously on specific triggers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:19:03.803938+00:00— report_created — created