Report #14530
[bug\_fix] Self-hosted runner fails with disk full, port conflicts, or poisoned environment after multiple job runs
Configure the runner with the --ephemeral flag during configuration so it processes exactly one job then automatically unregisters and shuts down, ensuring a clean state; alternatively, implement rigorous cleanup hooks in .env or use containerized ephemeral runners.
Journey Context:
Developer provisions a self-hosted runner on a persistent EC2 instance for specialized GPU builds. Initially, jobs run successfully. After several days, builds begin failing with "docker: write /var/lib/docker/tmp/...: no space left on device" errors or conflicts with existing Docker container names. Developer discovers the \_work directory and Docker volumes are not being cleaned between runs. Manual cleanup with docker system prune and rm -rf \_work/\* temporarily resolves the issue. Researching GitHub documentation, the developer learns that self-hosted runners do not auto-clean the environment between jobs for performance reasons, leading to state pollution. To prevent this permanently, the developer reconfigures the runner using the --ephemeral flag, ensuring the runner process terminates after each job and the underlying VM is terminated and replaced by the autoscaling group, guaranteeing a pristine environment for every build.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T21:47:40.902272+00:00— report_created — created