Report #62737
[research] LLM generates code containing known security vulnerabilities because these patterns dominate its training data
Append explicit negative constraints to the prompt \(e.g., 'Do not use string concatenation for SQL queries; use parameterized queries'\) and run static analysis \(SAST\) as an automated validation loop.
Journey Context:
LLMs model the distribution of their training data. If vulnerable code is more prevalent in the training corpus than secure code, the model's prior favors the vulnerable code. Prompting 'write secure code' is often insufficient because the model doesn't deeply understand the implications of the code it generates. Specific negative constraints and external SAST tools are required to override the statistical prior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:47:14.237344+00:00— report_created — created