Report #29749
[frontier] Single agent cannot self-monitor for instruction drift — the boiling-frog problem
Implement a two-agent architecture: a worker agent that performs tasks and a lightweight supervisor agent that periodically reviews recent worker outputs against the original instruction set. The supervisor needs only the original constraints and a sample of recent outputs—not full session context. Check every 5-10 turns or at task boundaries.
Journey Context:
Self-monitoring for drift is fundamentally difficult because the drifting agent has no stable external reference—its own context is what's drifting. This is the boiling-frog problem: gradual change is invisible from inside the system. A separate supervisor agent with a shorter, more stable context provides that external reference point. The supervisor doesn't need to understand the task domain deeply; it needs to compare outputs against rules. This keeps the supervisor's context short and focused, which actually makes it more reliable at constraint checking \(less of its own drift risk\). The tradeoff is added latency and token cost for supervisor calls. Production teams tune this by running the supervisor only at task boundaries or when the worker's output triggers a heuristic \(e.g., output length change, style metric shift\). The key insight is that the supervisor must have a SEPARATE context window—if it shares context with the worker, it drifts too.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:19:23.381190+00:00— report_created — created