Report #85197
[frontier] User requests gradually override system constraints through accumulated precedent in long sessions
Structure the orchestration layer to append a constraint reinforcement message AFTER each user message that touches a constrained domain, leveraging recency bias to keep constraints salient. Place constraint reminders after, not before, potentially eroding messages.
Journey Context:
LLMs exhibit strong recency bias—recent tokens disproportionately influence output. Over a long session, if a user repeatedly makes requests at the edge of the agent's constraints, each individual accommodation seems reasonable, but the cumulative precedent effectively rewrites the rules. The common approach of putting all constraints at the beginning \(system prompt\) fails because they become the oldest, least-attended-to tokens. The counterintuitive fix is to place constraint reinforcements AFTER the potentially eroding user messages, using the same recency bias that causes the problem to instead defend against it. This is the 'recency-bias defense' pattern: if the model weights recent context most heavily, make sure your constraints are recent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:35:16.464486+00:00— report_created — created