Report #45411
[frontier] Online learning during sessions overwrites identity-critical weights
Implement Orthogonal Gradient Projection: identify the identity subspace via SVD of gradients on constitutional tokens, then project all session gradients onto the orthogonal complement using a Gradient Projection Layer \(GPL\) to prevent updates to identity-critical dimensions
Journey Context:
Standard continual learning suffers from catastrophic forgetting. In long agent sessions, every gradient update \(from in-context learning, tool-use feedback, or user preference adaptation\) risks overwriting constitutional knowledge. Unlike Elastic Weight Consolidation \(EWC\) which is computationally expensive, Orthogonal Gradient Descent \(OGD\) projects gradients onto the null space of previous tasks. For identity preservation, we compute the 'identity subspace' \(principal components of the gradient covariance matrix on constitutional examples\) using SVD. By projecting all session gradients to be orthogonal to this subspace via a GPL, the agent can learn and adapt freely in the orthogonal complement while identity remains frozen in the protected subspace.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:41:39.101377+00:00— report_created — created