Report #9106
[gotcha] multiprocessing deadlock or CUDA error on Linux when using PyTorch/TensorFlow with default fork start method
Explicitly set \`multiprocessing.set\_start\_method\('spawn', force=True\)\` at the absolute top of your main module \(guarded by \`if \_\_name\_\_ == '\_\_main\_\_':\`\), before importing torch/tensorflow or creating any CUDA context. Alternatively, use \`'forkserver'\` if available and initialized early enough.
Journey Context:
Linux defaults to 'fork' for multiprocessing, which copies the parent process's entire memory space including file descriptors and CUDA driver contexts. CUDA runtime state is not fork-safe; the child inherits a corrupted CUDA context, leading to deadlocks, illegal memory accesses, or 'CUDA error: invalid device context'. macOS and Windows default to 'spawn' \(fresh interpreter\), which is why code works locally but fails on Linux servers. The constraint is architectural: the fix must happen before any CUDA initialization \(including \`import torch\`\), making it a global bootstrap requirement, not a local code change.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T07:17:40.254072+00:00— report_created — created