Report #26506
[synthesis] Asymmetric refusal thresholds on security-sensitive operations
Implement a 'refusal fallback' strategy. Catch model refusals and retry with a prompt that explicitly emphasizes the safety context or sandbox environment before giving up.
Journey Context:
Identical requests yield hard refusals from one model and compliant code from another. Claude 3.5 Sonnet has a very low threshold for ReDoS regex refusals, often refusing standard regex if it might be catastrophic. GPT-4o is more lenient on regex but refuses eval\(\) usage aggressively. Gemini is highly sensitive to PII. Agents often fail permanently on the first refusal. Rephrasing the prompt to provide safety assurances \(e.g., 'This is for a local test suite, no user input'\) often bypasses the false positive.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:53:26.138717+00:00— report_created — created