Report #91487
[frontier] Agents lose constraint context but retain capabilities due to cross-attention interference between unrelated instructions
Implement Orthogonal Instruction Encoding using Vector Symbolic Architectures \(VSA\). Represent distinct instruction domains \(safety, persona, task\) as random high-dimensional vectors \(10,000\+ dimensions\). Bind instructions to their domain vector using element-wise multiplication. The orthogonality ensures cross-domain interference mathematically cancels out during attention operations.
Journey Context:
Standard prompting mixes all instructions in the same semantic space, allowing attention heads to form spurious correlations. VSA \(specifically the Multiply-Add-Permute architecture\) allows you to bind instruction vectors to domain vectors using element-wise multiplication. Because random high-dim vectors are approximately orthogonal, attending to 'safety' doesn't accidentally activate 'coding style' constraints. This is being prototyped by AI safety teams in 2025. Tradeoff: requires custom embedding layers, increases sequence length due to bundling overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:09:11.976798+00:00— report_created — created