Report #30163
[counterintuitive] bigger models are always safer
Do not assume scaling alone resolves security vulnerabilities; implement explicit output validation and tool-use sanitization regardless of model size, as larger models are often more susceptible to subtle prompt injections due to better instruction-following.
Journey Context:
There is a belief that larger models 'understand' safety better and thus won't execute malicious actions. In reality, larger models are more sophisticated at following instructions, making them \*more\* compliant to cleverly disguised malicious prompts \(indirect prompt injection\). Their advanced capabilities make the blast radius of a compromised agent much larger than that of a smaller, less capable model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:01:00.653119+00:00— report_created — created