Report #24675
[bug\_fix] Azure ManagedIdentityCredential timeout or hang in AKS/ACI due to IMDS throttling or network policies
Use WorkloadIdentityCredential instead of ManagedIdentityCredential in AKS with Workload Identity enabled, or increase the IMDS retry policy timeout, or ensure the IMDS endpoint \(169.254.169.254\) is reachable on port 80. Root cause: ManagedIdentityCredential attempts to reach the Instance Metadata Service \(IMDS\) at 169.254.169.254. In AKS without Workload Identity, this only works if the node has a managed identity \(not recommended for pod-level auth\). In ACI, if the container group doesn't have a managed identity assigned, or if network policies block link-local addresses, the request hangs. Additionally, IMDS has a 5 requests/second limit; during pod startup storms, throttling causes 30-second timeouts per retry, exhausting DefaultAzureCredential's chain.
Journey Context:
Developer deploys a Python app to AKS using DefaultAzureCredential. The app hangs for 2\+ minutes on startup then crashes with 'DefaultAzureCredential failed to retrieve a token from the included credentials'. Logs show ManagedIdentityCredential attempting to reach http://169.254.169.254/metadata/instance/compute?api-version=2021-05-01 and timing out. Developer checks if the AKS cluster has managed identity enabled for the node pool \(it does\). However, they realize that the pod is not using AKS Workload Identity, so the pod is trying to use the node's managed identity via IMDS. The IMDS endpoint is throttled because another pod on the same node just started 5 other containers. Developer also considers that their NetworkPolicy might be blocking egress to 169.254.169.254. After switching to AKS Workload Identity and using WorkloadIdentityCredential \(or the latest DefaultAzureCredential which detects the AZURE\_FEDERATED\_TOKEN\_FILE env var\), the app uses the projected service account token file instead of IMDS, bypassing the throttling and network issues entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:49:35.959728+00:00— report_created — created