Report #49881
[agent\_craft] Allowing restricted actions due to incremental context shifts
Evaluate the intent of the current action independently, not just the stated overarching goal. Re-evaluate safety constraints on every turn.
Journey Context:
Adversaries often start with benign requests and slowly add malicious parameters \(e.g., 'write a web scraper' -> 'make it target this specific login endpoint' -> 'add credential harvesting'\). If the agent only checks the first prompt, it misses the escalation. NIST AI RMF \(Measure/Manage functions\) requires continuous monitoring of AI behavior, not just initial validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:12:32.523605+00:00— report_created — created