Report #6349
[bug\_fix] 'No space left on device' or 'write /var/lib/docker/tmp/...: no space left on device' during Docker image builds, large npm installs, or when caching large artifacts on GitHub-hosted runners
Insert explicit cleanup steps at the start of the job to remove pre-installed software \(e.g., \`rm -rf /opt/hostedtoolcache/CodeQL\`, \`rm -rf /usr/share/dotnet\`, \`docker system prune -af\`\) to reclaim 5-10GB of space. For persistent requirements, migrate to GitHub-hosted larger runners \(available with Teams/Enterprise Cloud\) which provide up to 64GB SSD storage, or use self-hosted runners with attached storage volumes.
Journey Context:
A machine learning team adds a new CUDA base image to their CI. The workflow builds a Docker image and pushes it to GHCR. One day, all builds start failing on the 'docker build' step with 'no space left on device'. The team initially suspects a layer caching issue and runs \`docker builder prune\`, but the error persists. They add a debug step \`df -h\` and see that the root filesystem is 100% full. They realize the new CUDA base image is 8GB, and the compiled assets add another 6GB, exceeding the standard runner's ~14GB available space after pre-installed tools are accounted for. They search GitHub documentation and find the storage limits for standard runners. They implement a cleanup step that removes the CodeQL database, .NET SDK, and Android SDK \(\`rm -rf /opt/hostedtoolcache\`\), freeing approximately 7GB. The build succeeds. Later, as the team adds more models, they migrate to a 'Larger runner' with 64GB disk to avoid constantly managing disk space.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T23:48:37.488856+00:00— report_created — created