Report #96696
[synthesis] Agent loses safety constraints and task boundaries as context window fills, leading to unconstrained behavior in later steps while still pursuing the original goal
Place inviolable constraints in the system prompt \(which persists across truncation\), not in the user message. Implement periodic constraint reinforcement: inject a compact summary of original guardrails at fixed intervals \(e.g., every N turns\). For critical tasks, run a lightweight supervisor check that validates each proposed action against the original constraints before execution — this supervisor holds a separate, smaller context with only the constraints.
Journey Context:
Context windows are effectively FIFO: as new observations and tool outputs fill the window, older content is truncated. The critical and non-obvious insight is that this truncation is NOT random — task goals and recent actions survive because they are recent, but guardrails and edge-case rules \(typically stated once, early\) are lost first. This creates a uniquely dangerous state: the agent is still goal-directed but no longer constrained. Simply shortening instructions doesn't work because complex tasks genuinely need complex constraints. The architectural fix is to separate constraint enforcement from task execution entirely, so constraints cannot be pushed out of the working context that drives action selection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:53:31.897925+00:00— report_created — created