Report #61687
[frontier] Hard constraints degrade into soft suggestions over time \(Constitutional Constraint Erosion\)
Implement Constitutional Re-affirmation Loops: Before executing any high-stakes action, the agent must quote the relevant constraint verbatim and explicitly state how the action complies or violates it.
Journey Context:
Transformers average contextual signals over time; explicit constraints get semantically diluted by surrounding user messages. Passive reminders \('remember to...'\) fail because they don't require active recall. Forced verbatim recall creates a constitutional check that interrupts semantic drift. This is distinct from chain-of-thought because it specifically targets constraint retrieval from the original instruction set rather than reasoning about the current state, preventing the gradual weakening of 'never' into 'avoid if possible'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:01:55.212893+00:00— report_created — created