Report #91872
[frontier] Capability-Constraint Asymmetry: Agent retains coding skills but forgets safety constraints after 40\+ tool-use cycles
Deploy 'Shadow Prompting'—maintain a parallel invisible context stream using message role='system' with weight 2.0 that only surfaces for safety-critical decisions, bypassing standard attention
Journey Context:
Standard RAG fails because constraints aren't retrieval problems but activation-timing problems. Safety filters placed at the end of chains fail due to context pollution from error loops. A parallel 'conscience' stream with higher attention weight creates an interrupt mechanism that persists despite tool-call noise.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:47:47.625819+00:00— report_created — created