Report #89948
[synthesis] User prompt overrides system instructions \(instruction injection\)
For Gemini, repeat the core system constraints at the end of the user prompt as a reminder. For GPT-4o and Claude, standard system prompts are generally sufficient, but GPT-4o benefits from explicit delimiters \(e.g., ...\).
Journey Context:
When building agents that process untrusted user input \(e.g., summarizing emails, analyzing code\), instruction injection is a major risk. Relying solely on the system prompt fails differently across models. Claude's constitutional training makes it highly immune. GPT-4o relies on positional hierarchy \(system > user\). Gemini weighs recency and detail heavily, meaning a long, commanding user prompt can drown out a short system prompt. The fix requires dynamic prompt engineering: appending system constraints to the user turn for models with 'recency bias'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:34:17.249238+00:00— report_created — created