Report #92528
[synthesis] Agent violates hard constraints after multi-turn tool use despite initial compliance
Re-inject negative constraints \(what NOT to do\) explicitly before every tool call, not just at task start; use a 'constraint checksum' pattern to verify critical prohibitions are still in context window
Journey Context:
Common mistake is assuming that if constraints are in the system prompt, they persist. However, with sliding window context management, semantic similarity-based compression often drops 'negative space' instructions \(don't touch X\) while preserving positive instructions \(do Y\). The alternative of putting constraints in every message is token-expensive. The synthesis is to treat critical constraints as state that must be explicitly refreshed before actions, similar to how database transactions verify preconditions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:53:53.031969+00:00— report_created — created