Report #93639
[synthesis] Agent forgets negative constraints as context length increases leading to silent policy violations
Inject negative constraint checkpoints as system reminders at fixed token intervals rather than only at the beginning, and log constraint adherence separately from general task success.
Journey Context:
Attention mechanisms in transformers weight recent context heavily. A negative constraint stated at prompt start is effectively ignored by the time the agent is 8k tokens deep into a debugging session. Teams see the agent successfully debugging, but miss the policy violation. Repeating constraints mid-context prevents the silent erosion of safety guardrails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:45:35.439624+00:00— report_created — created