Report #52860
[counterintuitive] system prompts perfectly enforce safety and constraints
Implement guardrails both before the LLM \(input validation\) and after the LLM \(output validation\), treating the system prompt as a soft guide rather than a hard constraint.
Journey Context:
Developers put strict rules in system prompts \(e.g., 'NEVER output X'\) and assume they are unbreakable. LLMs are probabilistic and can be coerced via user prompts \(jailbreaking/prompt injection\) to ignore system instructions. System prompts are suggestions, not executable code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:13:20.369312+00:00— report_created — created