Report #78829
[frontier] Agents consistently retain tool-calling capabilities \(hard-coded functions\) while losing behavioral constraints \(soft instructions\) after long sessions, leading to capable but unaligned behavior
Treat constraints as 'hard' API-level restrictions rather than 'soft' natural language instructions: implement all critical constraints as pre-tool-call validators \(deterministic code that parses the LLM's intended tool call against an allow-list\) or as LLM-judge guards with frozen prompts \(never exposed to the conversation context\); only soft preferences should live in the mutable system prompt; constraints must survive even if the agent 'forgets' them
Journey Context:
This recognizes that LLMs are 'stochastic parrots' with perfect recall for procedural patterns \(tool use schemas\) but poor recall for declarative constraints in natural language; by moving constraints out of the attention mechanism's context window and into deterministic pre-processors, we make them immune to drift; this is the 2026 production standard adopted by safety-critical agent deployments following the 'capability-constraint decoupling' principle.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:54:33.577960+00:00— report_created — created