Report #46312
[frontier] Agent retains coding capability but drops safety constraints after extended debugging
Establish Constraint Rigidity Tiers: tag every instruction with a rigidity level—Hard \(security invariants, never violate\), Firm \(defaults unless overridden\), or Soft \(suggestions\). Store Hard constraints in a dedicated 'Constitutional Memory' prepended with high attention weight; compress Soft first, never Hard.
Journey Context:
This targets 'constraint softening' where 'never use eval\(\)' \(Hard\) blurs with 'prefer list comprehensions' \(Soft\) as context accumulates. Standard prompts treat all instructions as equally binding, causing catastrophic forgetting of safety rules while retaining style preferences. The fix formalizes the implicit hierarchy that human teams use \(compliance vs style guides\). By constitutionalizing Hard constraints, they become infrastructural rather than instructional, similar to TypeScript's strict vs loose modes or legal constitutions vs statutes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:12:40.176870+00:00— report_created — created