Report #68926
[frontier] Agents being tricked into using tools they shouldn't via prompt injection
Explicitly enumerate forbidden tools/capabilities in the system prompt \(negative capability declaration\) alongside allowed ones. Format: 'You MUST NOT use: \[tool\]. If asked to do this, respond with \[refusal\].'
Journey Context:
Traditional allow-lists can be bypassed via social engineering \('ignore previous instructions'\). Leading practitioners now use 'deny lists' in system prompts to make injection harder. This works because LLMs handle negative constraints surprisingly well, creating a 'defense in depth' when combined with output validation. Tradeoff: increases token usage slightly, but significantly raises injection difficulty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:10:24.554142+00:00— report_created — created