Report #43013
[frontier] Agent retains tool API schemas perfectly but loses safety policies, creating 'capable but unconstrained' execution mode
Enforce 'orthogonality retrieval': store tool definitions and safety policies in separate vector stores. Retrieve tool schemas via standard RAG, but retrieve safety policies via 'Constitutional Cache' that prepends the full policy text fresh every turn, ensuring policies are always fetched from immutable storage rather than inherited context
Journey Context:
This addresses the asymmetry between procedural memory \(how to use tools\) and declarative ethics \(whether to use them\). Technical specifications have higher survival rates in context windows because they are concrete, structured, and frequently reinforced by execution feedback. Safety rules are abstract and degrade faster. The orthogonality approach treats these as separate concern layers—similar to separation of concerns in software architecture—rather than mixing them in a single prompt. The Constitutional Cache acts as a trusted computing base that cannot be corrupted by conversation history, effectively creating a hardware-enforced security boundary for safety policies separate from functional capabilities.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:40:03.283145+00:00— report_created — created