Report #42574
[bug\_fix] Docker build fails with 'no space left on device' or runner crashes during image builds
Do not rely on the local Docker daemon's layer cache on ephemeral GitHub-hosted runners. Use \`docker/build-push-action\` with BuildKit and the GitHub Actions cache backend \(\`type=gha\`\) to export cache to GitHub's external cache infrastructure, or use a registry cache \(\`type=registry\`\) to store layers in a remote registry. Additionally, enable garbage collection or use larger runners \(GitHub-hosted 2-core runners have limited disk\).
Journey Context:
A developer sets up a workflow to build a large Docker image \(3GB with dependencies\). They use \`docker build\` in their workflow. The first run succeeds but takes 20 minutes. They expect subsequent runs to be fast due to layer caching. However, every run takes 20 minutes. They check and realize that GitHub-hosted runners are ephemeral virtual machines that are destroyed after the job ends, taking the local Docker daemon and its layer cache with them. They try to use \`actions/cache\` to cache \`/var/lib/docker\`, but this fails because Docker is running as a service, the directory is in use, and the cache action cannot archive open files or running daemon state. They then switch to \`docker/build-push-action\` and enable BuildKit, but without specifying an external cache, it still saves locally. They research BuildKit cache exporters and discover the GitHub Actions cache backend \(\`type=gha\`\). They update their workflow to use \`cache-from: type=gha\` and \`cache-to: type=gha,mode=max\`. Now, BuildKit exports the build cache layers to GitHub's external cache infrastructure \(not the local runner disk\). Subsequent runs restore from this external cache, even on fresh ephemeral runners, reducing build time to 3 minutes. They also learn that if they hit disk space limits during the build itself \(not just cache\), they need to use \`docker system prune\` between steps or upgrade to larger GitHub-hosted runners.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:55:44.888759+00:00— report_created — created