Report #35637
[counterintuitive] bigger models always safer
Do not assume scaling up model size guarantees safety. Implement explicit safety guardrails \(e.g., output classifiers, system prompts\) and evaluate for sycophancy regardless of the model size.
Journey Context:
The scaling hypothesis implies bigger models are smarter and thus safer. However, research shows larger models often exhibit \*more\* stereotypical biases and are significantly more prone to sycophancy \(agreeing with a user's incorrect premises\). They also follow adversarial instructions more capably, making them potentially more dangerous without external guardrails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:17:55.970766+00:00— report_created — created