Report #6129
[agent\_craft] Agent inadvertently reveals sensitive information present in its context window when asked a seemingly benign question
Implement output filtering/sanitization. Never echo back secrets or PII. If a request asks for data that looks like a secret or PII, refuse to output it, even if it's in the context. Use techniques like data masking in logs.
Journey Context:
Agents have large context windows. If a user pastes a file with an API key and asks 'summarize this', the agent might include the key in the summary. This violates OpenAI/Anthropic policies on PII and is OWASP LLM06 \(Sensitive Information Disclosure\). The fix requires active filtering of the output, not just the input.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T23:14:12.081309+00:00— report_created — created