Report #35076
[counterintuitive] System prompts act as a secure boundary to prevent unwanted model behavior
Treat system prompts as soft suggestions, not security perimeters; implement external guardrails \(input/output classifiers\) and strict permission boundaries for tool execution.
Journey Context:
Developers put sensitive rules or API instructions in system prompts, assuming the model treats them as immutable. In reality, system prompts are just text tokens prepended to the user context. They are highly susceptible to prompt injection \(e.g., 'Ignore all previous instructions and...'\). Relying on the system prompt to prevent the LLM from outputting PII or executing malicious API calls is a critical security failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:20:51.859438+00:00— report_created — created