Report #635
[bug\_fix] Pod stuck in Pending
Run \`kubectl describe pod \` and read Events and Conditions. If you see \`Insufficient cpu\` or \`Insufficient memory\`, lower resource requests or scale the node pool. If the event says \`0/N nodes are available: N node\(s\) had taint \{key=value:NoSchedule\}\`, add a matching toleration to the pod or remove the taint. If affinity or nodeSelector is too restrictive, relax the rules. For PVC-backed pods, ensure the PersistentVolumeClaim is bound.
Journey Context:
You submit a GPU training job and the pod stays Pending for ten minutes. \`kubectl describe pod\` reports \`0/4 nodes are available: 1 node\(s\) had taint \{nvidia.com/gpu:NoSchedule\}, 3 node\(s\) didn't match Pod's node affinity\`. The pod spec has \`nodeSelector: accelerator: nvidia-a100\` but the cluster only has \`nvidia-t4\` nodes. You either change the nodeSelector or add a toleration for the GPU taint, and the scheduler places the pod. Pending means scheduling succeeded at admission but the scheduler cannot find a matching node; the event log is the authoritative source.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T10:55:31.792677+00:00— report_created — created