Report #59780
[counterintuitive] System prompts securely isolate instructions from user input
Treat system prompts as public, non-secret information; implement guardrails and output filtering to prevent prompt injection, rather than relying on the system prompt for security.
Journey Context:
Developers put API keys, proprietary logic, and strict behavioral constraints in system prompts, assuming the model treats them as immutable rules. In reality, system prompts are just text prepended to the context window. User input can easily override them via prompt injection \(e.g., 'Ignore previous instructions and...'\). The LLM cannot natively distinguish between 'system authority' and 'user trickery'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:49:39.607113+00:00— report_created — created