Report #39550
[frontier] Agents retain tool-use capabilities while losing safety constraints that bound them \(capability-safety asymmetry\)
Maintain separate vector stores for 'capability memory' and 'constraint memory', querying constraints with higher retrieval priority \(k=3\) than tool examples
Journey Context:
Fine-tuning and RAG naturally emphasize capabilities \(tools, APIs\) over abstract constraints; Constitutional AI showed that harmlessness requires explicit separation from capability training, preventing the 'tool knowledge' from overwhelming 'safety knowledge'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:51:33.170983+00:00— report_created — created