Report #52873
[counterintuitive] bigger models are always safer
Do not assume scaling up model size inherently reduces harmful outputs; explicitly test larger models for sycophancy and increased capability to generate nuanced harmful content.
Journey Context:
There is an assumption that larger, more 'intelligent' models will naturally understand human values better. In reality, larger models can be more susceptible to sycophancy \(agreeing with harmful user premises\) and are better at generating highly convincing, dangerous content when jailbroken. Scaling capability without scaling alignment increases risk.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:14:33.433550+00:00— report_created — created