Report #91066
[frontier] Different types of constraints decay at different rates and uniform re-injection wastes tokens
Build a constraint hierarchy with three tiers: safety constraints \(re-inject every 5-8 turns\), domain/business logic constraints \(every 10-15 turns\), and style/persona constraints \(every 15-20 turns\). Safety constraints decay slowest due to base model training, style constraints decay fastest.
Journey Context:
Production observation reveals a constraint hierarchy decay pattern: safety constraints \(e.g., 'never delete production data'\) decay slowest because they align with the base model's RLHF training; domain constraints \(e.g., 'use repository pattern for data access'\) decay at a medium rate; style constraints \(e.g., 'use Oxford comma'\) decay fastest because they are the most superficial and least reinforced. Uniform re-injection wastes tokens by over-injecting safety constraints and under-injecting style constraints. Tiered re-injection optimizes token budget against decay rate. Anthropic's documented values hierarchy \(harmlessness > helpfulness > honesty\) reflects the same underlying training dynamic that makes safety constraints more persistent. The implementation is straightforward: tag each constraint with its tier and adjust re-injection frequency accordingly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:27:01.960418+00:00— report_created — created