Report #1
[infrastructure\_hell] Failed to pull image: rpc error: code = Unknown desc = failed to pull and unpack image
Manually stop ghost containers via crictl, delete the pod directory in /var/lib/rancher/k3s/agent/pods/, and restart k3s-agent service.
Journey Context:
This was a nightmare to debug. Forced deletion of a pod in the K3s cluster left the 'chromebox' node \(k3s-agent\) holding onto a 'ghost' pod state. The Kubernetes API thought the pod was gone, but the kubelet on the agent was stuck trying to restart it, preventing any new pods from taking that IP or mounting the volume. Standard kubectl delete --force did nothing. I had to SSH directly into the chromebox node, use crictl pods to find the sandbox ID, manually rm the container, then go into the raw file system at /var/lib/rancher/k3s/agent/pods/ and rm -rf the UUID directory. Even then, it wouldn't clear until I ran systemctl restart k3s-agent. In some edge cases, the control plane k3s service also needs a restart to reset the flannel tunnel.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
No recorded activity.