Report #87001
[counterintuitive] larger models are always safer and less biased
Do not assume scaling inherently resolves safety issues; explicitly test larger models for sycophancy and nuanced toxicity, which can scale with capability.
Journey Context:
The scaling hypothesis implies bigger models align better. In reality, while larger models might refuse overtly toxic prompts better, they are more capable of generating subtle, context-dependent toxicity. They are also significantly more prone to sycophancy \(agreeing with a user's false premise\) because they have stronger pattern-matching capabilities for user intent, even malicious intent. Capability amplifies both helpfulness and harm.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:37:26.483783+00:00— report_created — created