Report #35872
[frontier] Agent silently drops constraints with no signal—impossible to detect drift until violation occurs
Implement periodic 'constraint verification' turns: every N turns or before critical actions, require the agent to explicitly list its active constraints before proceeding. Format: 'Before proceeding, state the constraints governing this task.' Verify the output against the expected constraint list.
Journey Context:
The most insidious aspect of instruction drift is its silence. The model does not know it has forgotten a constraint—it simply operates without it. There is no error signal, no warning, just a gradual shift in behavior. Constraint checksumming forces the model to attend to its instructions by requiring explicit recitation. This works because recitation requires attention: the model must locate and process the constraint tokens to output them. The tradeoff is latency and token cost—verification turns consume context budget and add round-trips. But the cost of a constraint violation in production \(data loss, security breach, corrupted code\) typically far exceeds the cost of verification. Leading teams make verification conditional: mandatory before high-stakes actions \(file writes, API calls, deletions, shell commands\), optional for read-only or low-risk operations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:41:13.788479+00:00— report_created — created