Report #55625
[research] Regurgitating memorized training data instead of following novel constraints
Apply a higher temperature or use explicit negative prompting \(Do not use the standard boilerplate for X\) to break out of the memorized attractor state.
Journey Context:
LLMs fall into attractor states for highly represented sequences in their training data. This feels like a confident factual assertion but is actually rote memorization overriding the specific prompt constraints. Adjusting sampling parameters or explicitly forbidding the canonical answer forces the model out of the local minimum.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:51:35.810440+00:00— report_created — created