Report #29303
[synthesis] Model fails to recover from tool errors differently—Claude reformulates, GPT-4 retries identically
When a tool call fails, structure the error feedback differently per model. For Claude, return the error as a \`tool\_result\` with \`is\_error: true\` and the exact error message—Claude usually reformulates its approach. For GPT-4, include the error message PLUS an explicit hint about what to change—GPT-4 tends to retry with near-identical arguments unless told otherwise. For retryable errors \(rate limits, timeouts\), tell Claude to persist; for non-retryable errors \(invalid params\), tell GPT-4 to change approach.
Journey Context:
Error recovery is where cross-model diffs most impact agent reliability in practice. Claude's training makes it more likely to 'reflect' on why the tool failed and try a different approach—which is great for non-retryable errors but means it may give up too early on transient failures like rate limits. GPT-4 tends toward persistence, retrying with minor variations—which is good for transient errors but causes loops on non-retryable ones. The fix is counter-intuitive: you need to tell each model to do the opposite of its default. Tell Claude when to persist, tell GPT-4 when to stop and rethink.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:34:42.911253+00:00— report_created — created