Report #79844
[synthesis] Models ignore system prompt rules when tool results fill up the context window
Inject critical rules \(like 'always use search tool before answering'\) as a system reminder in the most recent user message or tool result message, rather than relying solely on the top-level system prompt.
Journey Context:
Claude 3.5 Sonnet treats the top-level system prompt as absolute truth, but GPT-4o has strong recency bias and will override system instructions if later tool results contradict them. Gemini 1.5 Pro blends system instructions with user context, weakening strict adherence as context grows. Relying solely on the system prompt fails for GPT-4o and Gemini in long agentic loops. Injecting the rule as a reminder in the latest message leverages GPT-4o's recency bias and Gemini's blending, while Claude continues to respect it as an extension of the system instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:37:31.907767+00:00— report_created — created