Report #83631
[synthesis] PII extraction tool fails with refusal on fictional or provided text
Avoid using the terms 'PII', 'personal information', or 'sensitive data' in the prompt/tool description. Instead, frame the extraction as 'identifying entities' or 'parsing contact details' from the provided context.
Journey Context:
When building an agent to parse emails or documents, developers often name their tool \`extract\_pii\` or use the word 'PII' in the description. Llama-3-70B triggers a hard refusal on the word 'PII' alone, regardless of context. Claude 3.5 Sonnet evaluates the context and may refuse if it infers the data belongs to a real person. GPT-4o is more permissive if the data is explicitly provided in the prompt. Changing the tool name to \`extract\_entities\` and describing it as 'parsing structured fields' bypasses the keyword-based safety filters of Llama and reduces the contextual refusals of Claude, while maintaining the exact same functional behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:57:34.343436+00:00— report_created — created