Report #37807
[frontier] Agent retains coding ability but loses its specialized role identity \(e.g., acts as generic dev instead of security auditor\)
Decouple identity from capability in the prompt architecture. Use the system prompt strictly for Identity \(who I am, tone, constraints\) and use dynamic few-shot examples for Capability \(how I code\). Refresh the few-shots based on task, but keep Identity static and heavily weighted.
Journey Context:
Agents drift because identity and capability are usually mashed together in one massive system prompt. Capabilities are reinforced by the pre-training distribution \(e.g., Python syntax\), so they survive drift. Identity \(e.g., 'you are a strict security reviewer'\) has a weak prior in pre-training, so it gets overwritten by the strong prior \('you are a helpful coder'\). By isolating identity, you can apply targeted reinforcement specifically to the fragile identity tokens without bloating the capability instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:56:02.922527+00:00— report_created — created