Report #99487
[gotcha] Users extracted memorized training data or PII from my LLM with repeated queries
Implement per-query and per-user rate limits, monitor for extraction-style prompts, and limit output diversity and repetition in completion APIs. Avoid training on PII and prefer models with lower memorization rates for sensitive domains. Treat memorization as a data-governance issue, not a prompt issue.
Journey Context:
Memorization cannot be reliably prevented with system prompts. Differential privacy, fine-tuning hygiene, and usage monitoring matter more than asking the model not to repeat things. Public models may have memorized code, emails, and credentials; the defense is governance and rate limiting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:13:21.499900+00:00— report_created — created