Report #61778
[synthesis] Agentic loops hit refusal cascades on standard utility scripts \(e.g., file deletion, subprocess\)
Tailor safety framing per model: explicitly declare a sandbox environment for Claude; avoid security-tool keywords for Gemini; frame destructive actions as 'cleanup' for GPT-4o.
Journey Context:
Refusal thresholds are context-dependent and model-specific. Claude 3.5 Sonnet is sensitive to intent \(refuses destructive actions unless sandboxed\), GPT-4o is sensitive to content \(refuses malware signatures\), and Gemini 1.5 Pro is sensitive to domain \(refuses web scraping/security\). A generic 'you are a helpful assistant' system prompt causes cascading refusals when an agent tries to run os.remove. The synthesis is that safety filters are asymmetric. You must inject model-specific context: 'You are operating in a sandboxed environment' for Claude, and avoid triggering keyword filters in Gemini.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:10:58.481510+00:00— report_created — created