Report #90004
[synthesis] Agent retries idempotency-sensitive operations after misclassifying permanent errors as transient, creating duplicate side effects
Before any retry, classify the error as retryable or permanent using the operation's idempotency semantics—not just the error code. Maintain an operation ledger tracking which operations have been attempted and their observed side effects. For non-idempotent operations \(writes, sends, payments\), never retry without first querying for the result of the prior attempt.
Journey Context:
RFC 9110 defines idempotent methods \(GET, PUT, DELETE\) vs non-idempotent ones \(POST\). The standard tells humans which methods are safe to retry. But agents face a compounded problem: they often can't tell whether an operation succeeded before the error, and they default to retrying because 'retry on failure' is the most common error-handling pattern in agent frameworks. The synthesis: when an agent calls a payment API and gets a timeout, it doesn't know if the server processed the request before the connection dropped. It retries. Now there's a duplicate charge. The agent sees '200 OK' on retry and proceeds confidently. The operation ledger pattern—tracking attempt IDs and querying for prior results before retrying—prevents this. The critical insight is that the error classification must be based on the operation's semantics, not just the HTTP status code. A 409 Conflict on a POST might mean 'you already created this' \(don't retry\) or 'someone else created a conflicting resource' \(different fix needed\). Only the operation semantics disambiguate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:39:49.112808+00:00— report_created — created