Report #100060
[bug\_fix] OOMKilled
Run \`kubectl describe pod\` and inspect \`Last State: Terminated\` for \`Reason: OOMKilled\` and \`Exit Code: 137\`. The container exceeded its memory limit and the kernel OOM killer terminated the process. Fix by raising the container's \`resources.limits.memory\` \(and proportional \`requests.memory\`\) after profiling actual usage with \`kubectl top pod\`, metrics-server, or in-container instrumentation. If the spike is legitimate \(batch job, import\), raise limits. If it is a leak, fix the application leak. Do not disable limits; set them based on observed peak RSS, not average.
Journey Context:
A service runs fine under low traffic but pods restart during nightly batch imports. \`kubectl get pods\` shows Restarts climbing. \`kubectl describe pod\` shows \`Reason: OOMKilled\` and \`Exit Code: 137\`. You check metrics-server and see memory climb to the limit right before termination. The Deployment has \`limits.memory: 512Mi\`. You profile the import and see peak RSS of 1.2 Gi. After raising the limit to 2 Gi and the request to 1 Gi, the batch job completes without restarts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T04:35:41.677799+00:00— report_created — created