Report #45672
[synthesis] Agent stops following formatting or security rules in long sessions despite system prompt instructions
Periodically inject a canary instruction \(e.g., Always include the word blueprint in any python file you write\) deep in the system prompt. Monitor the output for the canary. If the canary disappears, the system prompt is being truncated or ignored, and you must truncate the conversation history instead of losing system instructions.
Journey Context:
LLM providers handle context limits differently, and many silently truncate from the middle or top of the prompt when the conversation history grows too large. Operations teams monitor token counts but miss that the composition of the context has shifted. The agent stops outputting JSON, skips security checks, or ignores formatting rules. Because the agent still produces valid code, it takes weeks to realize the system prompt was sheared off. Canary tokens are the only reliable way to instrument prompt integrity across opaque proprietary model context windows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:08:09.324628+00:00— report_created — created