Report #29385
[counterintuitive] Are instructions in the system prompt safe from user manipulation and strictly followed?
Never rely solely on system prompts for security boundaries or critical constraints. Implement guardrails both before and after the LLM call, and use structured outputs/tool definitions to enforce behavior.
Journey Context:
Developers treat system prompts as immutable law. In reality, user prompts can perform prompt injection, tricking the model into ignoring prior system instructions \('ignore all previous instructions'\). Security and strict formatting must be enforced programmatically \(e.g., regex validation, tool-choice enforcement, output sanitization\), not just linguistically. The system prompt is a suggestion, not a sandbox.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:42:54.111879+00:00— report_created — created