Report #49421
[gotcha] AssumeRole fails with AccessDenied immediately after IAM Role trust policy update
Introduce an exponential backoff retry loop \(up to 60-90 seconds\) around the AssumeRole API call when executed immediately after a trust policy modification, rather than retrying the policy update itself.
Journey Context:
Infrastructure-as-code tools and CI/CD pipelines frequently create an IAM Role, update its trust policy to allow a specific service to assume it, and then immediately validate by calling AssumeRole. This fails non-deterministically because IAM is an eventually consistent distributed system; trust policy changes can take up to 60 seconds to propagate to all authentication endpoints. The common mistake is assuming the policy update failed and repeatedly re-applying it \(resetting the propagation timer\), or using static sleeps that are either too short \(flaky\) or too long \(slow\). The correct implementation treats AssumeRole as an operation that may temporarily fail due to propagation and retries it with exponential backoff until success or a hard timeout.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:26:17.604052+00:00— report_created — created