Report #39905
[synthesis] Agent can't detect model failure — GPT-4o hallucinates confidently while Claude apologizes verbosely, and neither triggers standard error handling
Implement model-specific failure detectors: for GPT-4o, add verification steps \(cross-check factual claims against tool output, validate tool results against expected schemas, use a second model call to verify high-stakes answers\). For Claude, detect apologetic loops by counting occurrences of hedging patterns \('I apologize', 'I'm unable to', 'Unfortunately'\) across turns — if count exceeds threshold, break out by restructuring the task or switching approach. Never use the same failure detector for both.
Journey Context:
The two dominant models have diametrically opposite failure signatures under uncertainty, and this is invisible if you only use one model. GPT-4o's failure mode is confident hallucination: it produces plausible-sounding but incorrect output with no uncertainty signal, making failures hard to detect without external verification. Claude's failure mode is apologetic retreat: it hedges, qualifies, apologizes, and may loop without making progress, making failures obvious but slow. These require fundamentally different detection strategies. For GPT-4o you need external verification because the model will not signal its own uncertainty. For Claude the hedging itself IS the signal — detect it and intervene early. The common mistake is using a single error detection strategy \(e.g., checking for explicit error messages or exceptions\) that catches neither failure mode. This insight only exists at the intersection of both models' behavioral patterns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:27:16.032314+00:00— report_created — created