Report #7242
[bug\_fix] Token has been expired or revoked due to clock skew during refresh
Synchronize the system clock using NTP \(Network Time Protocol\). Ensure the operating system's time service \(e.g., systemd-timesyncd, chrony, or ntpd\) is running and configured to sync with reliable time sources \(e.g., metadata.google.internal for GCP, time.aws.com for AWS, or pool.ntp.org\). If running in a container, ensure the container shares the host's clock \(which is default\) or runs its own time sync, and verify that the hypervisor time is correct. Root cause: OAuth2 access tokens and refresh tokens carry time-based claims \(issued-at 'iat' and expiration 'exp'\). Google's authorization servers \(and AWS/Azure equivalents\) validate these timestamps against server time. If the client clock is skewed by more than a few minutes \(typically 5 minutes\), the server rejects token requests with 'Token has been expired or revoked' or 'Invalid JWT: Token must be a short-lived token' because the signature appears stale or the token appears to be used before it was issued.
Journey Context:
A site reliability engineer deploys a Go microservice to Google Kubernetes Engine. The service uses Workload Identity to access Cloud SQL. After running stably for weeks, the service suddenly logs continuous errors: 'oauth2/google: unable to generate access token: Post "https://oauth2.googleapis.com/token": oauth2: cannot fetch token: 401 Unauthorized, \{"error":"invalid\_grant","error\_description":"Token has been expired or revoked."\}'. The engineer checks the Workload Identity service account permissions—unchanged. They exec into the pod and try to curl the metadata server token endpoint manually; it returns a token successfully. However, when they decode the JWT, they notice the 'iat' claim is 8 minutes in the future relative to their local laptop's clock. They check the node time with 'date' and realize the GKE node's system clock is 8 minutes fast due to a paused VM or NTP drift. The Google OAuth server rejects the token refresh because the request's timestamp \(signed by the metadata server, which uses node time\) is too far in the future. After restarting the node pool to force NTP resync, the issue resolves. The fix is ensuring NTP is locked.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T02:12:22.209270+00:00— report_created — created