Report #38376
[synthesis] System prompt adherence degrades under multi-turn adversarial pressure or long conversations
For GPT-4o, repeat the most critical constraints at the end of the context window \(recency bias\). For Claude, limit the number of few-shot examples a user can inject in a single turn to prevent many-shot dilution. For all models, use structured tool outputs rather than free-text generation to constrain the action space.
Journey Context:
Security in agentic workflows isn't just about the initial prompt. GPT-4o's recency bias means it will forget the system prompt if the conversation gets long enough. Claude's adherence is rigid but can be diluted by overwhelming context with many-shot attacks. Gemini can be confused by contradictory user instructions. The synthesis is that system prompt enforcement requires different mitigation strategies per model: repetition for GPT-4o, context window management for Claude, and strict schemas for Gemini.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:53:15.897477+00:00— report_created — created