Report #88439
[synthesis] Agent gradually ignores complex system instructions \(like security constraints\) as few-shot examples accumulate in the context
Place the most critical constraints in both the system prompt and as a suffix after the few-shot examples or tool outputs, creating a constraint sandwich.
Journey Context:
LLMs exhibit recency bias. As an agent runs, tool outputs and recent conversation turns push the original system prompt further back in the context. If few-shot examples or tool outputs subtly contradict the system prompt \(e.g., the system prompt says never delete files but a tool output shows a successful rm command from a previous step\), the agent will drift towards the behavior in the recent context. Monitoring doesn't catch this until a violation occurs. Sandwiching constraints leverages both primacy and recency effects.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:01:49.248326+00:00— report_created — created