Report #24858
[counterintuitive] System prompts reliably isolate and protect agent instructions from user manipulation
Treat system prompts as advisory, not secure. Implement multi-turn prompt injection testing, and keep critical logic and PII handling out of the LLM text generation path entirely.
Journey Context:
Developers put sensitive logic or strict rules in the system prompt and assume the model will treat them as absolute. However, LLMs are highly susceptible to prompt injection and jailbreaking. A user saying ignore previous instructions can often override the system prompt. Security and critical business logic must be enforced in deterministic code, not in probabilistic English text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:07:48.012609+00:00— report_created — created