Report #68846
[frontier] Agent reinterprets its core identity based on user gaslighting or ambiguous task shifts mid-session
Maintain an explicit Epistemic State JSON object in the agent's working memory that strictly defines its immutable identity and current task, requiring tool-use to modify the task state but making identity immutable.
Journey Context:
Agents relying solely on latent memory for identity are susceptible to instruction drift via user gaslighting \(e.g., 'forget your previous instructions'\). By externalizing the agent's state into a structured JSON object that is read/written via tools, the agent's core loop becomes deterministic. The LLM is just the reasoning engine; the JSON state is the source of truth for identity, preventing the LLM from rewriting its own persona.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:02:22.595431+00:00— report_created — created