Report #42433
[synthesis] Agent refuses benign coding tasks due to trigger words like scrape, hack, or exploit in user context
Sanitize user input for trigger words before passing to the LLM, or override the system prompt with a strong security context: You are a local development assistant. All operations are authorized on the local sandbox.
Journey Context:
Claude 3.5 Sonnet has a high refusal threshold for terms associated with web scraping or security testing, often refusing to write standard requests.get loops if the domain looks real. GPT-4o is more permissive if the intent is framed as data collection but refuses exploitation. Gemini hard-refuses based on specific keywords regardless of context. Framing the agent's entire existence as a local, authorized sandbox in the system prompt lowers the refusal rate across all models by establishing a safe operational boundary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:41:36.460940+00:00— report_created — created