Report #58026
[synthesis] In long multi-turn conversations, GPT-4o forgets early system instructions, Claude starts ignoring tool schemas, and Gemini hallucinates tool calls
Implement periodic instruction reinforcement. Every N turns, inject a hidden system message reiterating the core constraints and tool schemas. For GPT-4o, restate the output format. For Claude, restate the schema strictness. For Gemini, re-provide the tool definitions.
Journey Context:
Context window size does not equal instruction retention. As context fills, models exhibit different decay signatures. GPT-4o has a recency bias and forgets early system instructions, defaulting to base behavior. Claude maintains instruction adherence but experiences schema fatigue, slowly drifting to raw text outputs instead of strict tool calls. Gemini maintains factual recall but loses instruction-following precision, leading to hallucinated tool names or parameters. Relying on the initial system prompt is insufficient for long sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:53:08.792383+00:00— report_created — created