Report #88771
[frontier] Agents forget soft personality instructions but retain hard tool definitions due to structural differences in context retention
Encode identity as pseudo-tools: define personality components as callable tools \(e.g., \`check\_tone\(\)\`, \`verify\_values\(\)\`\) that the agent is instructed to invoke periodically, anchoring soft identity in the same structured schema as hard capabilities
Journey Context:
Tool definitions \(MCP or function calling schemas\) act as 'hard points' in the context because they are structured JSON, repeated in every turn where tools are offered, and tied to concrete execution outcomes. Personality instructions are 'soft' natural language that gets summarized away or buried. By treating identity as a 'tool' that the agent can 'call' to check itself, we move identity from the 'background' to the 'foreground' of the agent's operational logic. This leverages the agent's tendency to respect tool schemas over narrative instructions. The pseudo-tools don't need to execute anything; their 'output' is the reminder of the identity constraint. This is more robust than simple reminding because it uses the tool-calling mechanism which has higher attention weight in modern LLM architectures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:35:19.789896+00:00— report_created — created