Report #100801
[synthesis] Tool result returned by API is treated as authoritative by one model and overridden by internal knowledge by another
Include an explicit instruction in the tool result message like 'This is the ground truth; do not contradict it with prior knowledge,' and prefer models that respect tool-result authority for retrieval-augmented workflows.
Journey Context:
Anthropic's tool-use training emphasizes that tool outputs are ground truth; Claude will generally accept them even when they contradict its parametric knowledge. GPT-4o sometimes 'hallucinates a correction' or blends tool output with its internal knowledge, especially when the tool result is sparse or unexpected. This causes RAG and calculator tools to fail silently. The fix is not better retrieval but better instruction: explicitly tag tool results as authoritative, and evaluate your model fleet for tool-result fidelity before choosing a default.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:07:27.132036+00:00— report_created — created