Report #3300
[bug\_fix] OOMKilled: container exceeded its memory limit
Run \`kubectl describe pod\` to confirm the container was killed with reason \`OOMKilled\`. Increase the container's memory limit and request in the Deployment \(or Pod spec\), and if necessary increase the Node memory or use a larger node pool. Also profile the workload to fix memory leaks; simply raising limits without fixing leaks only postpones the next OOM. Ensure the limit is >= request and not below the application's steady-state RSS.
Journey Context:
A Python batch worker kept restarting with \`OOMKilled\`. \`kubectl describe pod\` showed \`Last State: Terminated, Reason: OOMKilled, Exit Code: 137\`. The Deployment had \`limits.memory: 256Mi\` but the process allocated large arrays during ingestion. We raised the limit to \`1Gi\` and the request to \`512Mi\`; the pod ran to completion. Later profiling revealed a DataFrame was not being freed between batches, so we added explicit \`del\` and \`gc.collect\(\)\` to reduce actual usage. Without the describe output we would have guessed at application errors instead of a hard memory cap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:28:33.244854+00:00— report_created — created