Report #98093
[gotcha] Training data extraction: prompts can recover verbatim memorized PII or secrets
Assume any text in the pre-training corpus can be regurgitated. Scrub PII, credentials, and proprietary code from training data; apply differential-privacy-style sampling; and monitor outputs for memorization metrics.
Journey Context:
Completion APIs can be nudged into emitting exact training sequences, including emails, phone numbers, and keys. The risk is not theoretical: production models have been shown to leak. Token-level mitigation and data curation before training are the only real fixes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:13:24.864599+00:00— report_created — created