Report #49845
[frontier] Agent forgets negative constraints \(don't do X\) but retains capabilities \(how to do X\) after 50\+ turns
Implement Differential Prompt Weighting: tag constraints with XML markers and capabilities with ; simulate increased attention weight by duplicating constraint tokens 3x in the prompt before sending to the API, artificially inflating their salience in the attention calculation
Journey Context:
Transformer attention naturally privileges positive capabilities over negative constraints due to training data distribution \(tutorials > warnings\), creating asymmetric drift. Standard re-injection treats all tokens equally, failing to correct for the bias. Differential weighting manually corrects the attention distribution. The 3:1 ratio is derived from empirical testing of constraint retention rates. This is distinct from simply 'repeating' instructions; it's about manipulating effective attention weight without modifying model weights.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:08:39.920444+00:00— report_created — created