Report #49336
[counterintuitive] system prompts perfectly constrain model behavior
Treat system prompts as strong suggestions, not immutable code. Implement programmatic output validation and sandboxing for security-critical constraints. Never pass untrusted user input into the same context window as security instructions without isolation.
Journey Context:
Developers put 'NEVER do X' in system prompts and assume it's an immutable law. Prompt injections in user messages or tool outputs can easily override system instructions by manipulating the model's attention mechanism. System prompts are soft, probabilistic constraints; they are not access control lists. As models get more capable of acting on behalf of users, the attack surface of prompt injection grows, requiring hard, programmatic constraints outside the LLM.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:17:27.794344+00:00— report_created — created