Report #17239
[bug\_fix] No space left on device during Docker build or dependency installation on self-hosted runner
Add explicit cleanup steps at the beginning or end of the workflow using \`docker system prune -af --volumes\` to remove unused images, containers, and volumes, and \`rm -rf $\{\{ github.workspace \}\}/\*\` or tool-specific cache clears. For persistent self-hosted runners, configure a systemd timer or cron job to run cleanup regularly. The root cause is that GitHub Actions self-hosted runners are persistent environments that do not automatically clean up Docker layers, build artifacts, or tool caches between job executions, unlike ephemeral GitHub-hosted runners.
Journey Context:
A DevOps team uses a pool of persistent self-hosted runners to build large Docker images for microservices. Initially, builds complete in 10 minutes, but after two weeks, afternoon builds start failing intermittently with 'write /var/lib/docker/tmp/...: no space left on device' during the Docker build step. The team SSHs into the affected runner and runs \`df -h\`, confirming the root partition is 100% full. They run \`docker images\` and see hundreds of old images including many tagged \`\` \(dangling\) and previous versions of their microservice images consuming 150GB. They manually run \`docker system prune -af --volumes\` which frees 140GB and immediately fixes the issue. Realizing the runner persists between jobs unlike GitHub-hosted ephemeral runners, they modify the workflow to include a 'Cleanup Docker' step at the very end with \`if: always\(\)\` to ensure it runs even if the build fails. The step runs \`docker system prune -af\` or specifically removes the images they just built using \`docker rmi\` to keep the cache but remove the specific tags. They also add a 'Pre-cleanup' step at the start to handle cases where a previous run crashed before reaching the cleanup step. Disk usage stabilizes and builds remain fast.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T04:50:41.424824+00:00— report_created — created