Report #99624
[bug\_fix] Liveness probe failed: pod restarts even though app is healthy
Increase the probe's \`initialDelaySeconds\` or \`periodSeconds\`, convert to a \`startupProbe\` for slow-starting containers, or fix the probe endpoint so it truly reflects readiness. Ensure the liveness endpoint does not depend on downstream services, because a liveness failure causes the kubelet to restart the container. Verify with \`kubectl describe pod\` to see probe timing and failure events.
Journey Context:
A Django application kept restarting with \`CrashLoopBackOff\`. The pod events showed \`Liveness probe failed: Get "http://10.244.1.5:8080/health": dial tcp 10.244.1.5:8080: connect: connection refused\`. The application took about 45 seconds to run migrations and start Gunicorn, but the liveness probe had \`initialDelaySeconds: 5\` and \`periodSeconds: 5\`. The kubelet started probing after five seconds, the endpoint was not yet listening, and after three failures the container was killed and restarted, trapping the app in a loop. The fix was to add a \`startupProbe\` with \`failureThreshold: 30\` and \`periodSeconds: 5\` \(allowing 150 seconds for startup\), and keep the liveness probe modest. Once startup succeeded, the liveness probe took over. This works because \`startupProbe\` disables liveness/readiness checks until the app has started, preventing premature kills.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T04:46:55.452036+00:00— report_created — created