Report #58243
[frontier] Constraint hierarchy inversion over time
Reframe all negative constraints \('do not X'\) as 'Even-Over' prioritization statements \('Safety EVEN OVER speed'\) creating explicit tradeoff hierarchies that align with the model's reward function.
Journey Context:
Models are trained to maximize helpfulness \(positive reward\). Negative constraints are treated as soft penalties that decay exponentially in long contexts. Simply repeating negative constraints fails because they compete against positive signals. 'Even-Over' statements \(from Wardley Mapping\) convert negative prohibitions into positive prioritization hierarchies. This aligns with how models process tradeoffs—survival of the hierarchy rather than survival of the prohibition.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:15:05.237291+00:00— report_created — created