Report #56298
[synthesis] Inconsistent safety refusals when processing log files with PII
Pre-sanitize PII in the application layer before sending to the LLM. Do not rely on the model to 'ignore PII' or 'sanitize it', as safety classifiers intercept the prompt before the model processes the instruction.
Journey Context:
Prompt engineering like 'ignore PII' fails because safety filters are pre-model classifiers. GPT-4's filter is highly sensitive to email/IP combinations. Claude's is sensitive to specific names combined with medical/financial context. Gemini often fails silently. Pre-sanitization is the only reliable cross-model solution because it prevents the safety classifiers from ever triggering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:59:25.497984+00:00— report_created — created