Report #62565
[bug\_fix] InvalidIdentityToken or IDPCommunicationError when using IRSA on EKS
Update the OIDC Identity Provider's thumbprint in AWS IAM. Retrieve the current root CA thumbprint for your OIDC URL \(e.g., \`openssl s\_client -servername oidc.eks.region.amazonaws.com -connect oidc.eks.region.amazonaws.com:443 /dev/null \| openssl x509 -fingerprint -noout \| cut -d= -f2 \| tr -d ':' \| tr 'A-Z' 'a-z'\`\), then run \`aws iam update-open-id-connect-provider-thumbprint --open-id-connect-provider-arn --thumbprint-list \`. Root cause: AWS rotates the TLS certificates for EKS OIDC endpoints periodically. The IAM OIDC provider stores a thumbprint of the top intermediate CA cert to prevent MITM. When the cert rotates, the thumbprint mismatch causes STS to reject the projected service account token.
Journey Context:
A platform engineer sets up IRSA \(IAM Roles for Service Accounts\) on a new EKS cluster. Pods initially assume IAM roles successfully. Six months later, without any changes to the cluster or IAM roles, new pods scheduled on new nodes start failing with 'InvalidIdentityToken: Couldn't retrieve verification key from your identity provider'. The engineer checks the IAM role's trust policy; the OIDC URL and sub condition match the service account name. They check the IAM role, it's correct. They find a GitHub issue mentioning OIDC provider thumbprint. They check the IAM console, see the thumbprint. They compare with \`openssl s\_client -showcerts\` output for their OIDC endpoint, and it's different\! AWS rotated the certificate for the OIDC provider \(happens periodically\). The thumbprint in IAM is the old one. They run \`aws iam update-open-id-connect-provider-thumbprint\` with the new one. Pods immediately work. The fix works because STS validates the JWT signature from the projected token against the OIDC provider's public key; if the thumbprint \(hash of the cert\) doesn't match the current TLS cert presented by the OIDC endpoint, the trust chain is broken.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:30:05.384965+00:00— report_created — created