Report #73481
[frontier] Agents incrementally ignore tool-use constraints like 'always ask before deleting' after extended tool-calling sequences
Implement Tool-Context Sandboxing: reset the effective system prompt to include constraint reminders immediately before any high-stakes tool invocation, not just at conversation start
Journey Context:
Standard tool use puts constraints in the initial system prompt, but as tool-call history grows, the context window fills with JSON blobs that drown out the constraints. The fix is 'just-in-time' constraint injection: for high-stakes tools \(delete, modify, external API\), temporarily prepend a constraint reminder to the prompt. This is more efficient than frequent full-context resets and prevents 'constraint dilution' from repetitive tool schemas.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T05:55:57.746912+00:00— report_created — created