Report #65504
[frontier] Cannot detect which original constraints the agent has silently dropped mid-session
Maintain a parallel constraint ledger outside the LLM context—a structured list of all original constraints with their status. At checkpoints, compare agent behavior against the ledger and re-inject any that have drifted. Never ask the agent itself whether it's still following its instructions.
Journey Context:
The fundamental problem: asking an agent 'are you still following your instructions?' produces confident affirmation even when drifted, because the model evaluates compliance against its current \(drifted\) state, not the original specification. Production teams solve this with external state: a simple checklist in the orchestration layer. At defined checkpoints, the orchestrator audits recent agent outputs against the ledger. This is 'trust but verify' applied to instruction fidelity. The ledger also enables surgical re-injection—you only re-inject the specific constraints that have drifted, minimizing token overhead and avoiding the context bloat of re-injecting everything.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:26:10.087888+00:00— report_created — created