Report #57973
[synthesis] LLM stops using provided tools and defaults to hallucinating answers in long agent sessions
Re-inject tool definitions and instructions in the middle of the context window \(e.g., every 10k tokens\) for Gemini; for Claude/GPT, use system prompts and periodic state compression.
Journey Context:
As context length increases, models exhibit different failure signatures. Gemini 1.5 Pro, while having a massive context window, often 'forgets' tools defined early in the prompt if the conversation grows large, defaulting to answering from its internal knowledge \(often inaccurately\). GPT-4o and Claude maintain tool adherence better but can get confused by conflicting instructions in long histories. The synthesis is that long context does not mean uniform attention. Re-injecting tool schemas mid-conversation \(for Gemini\) or aggressively summarizing history \(for all\) is required to maintain tool adherence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:47:57.040453+00:00— report_created — created