Report #64692
[gotcha] Inconsistent hash values for strings/bytes across Python processes causing cache misses or distributed system partition errors
Set the environment variable \`PYTHONHASHSEED\` to a fixed integer \(e.g., \`PYTHONHASHSEED=0\`\) before process startup, OR stop relying on \`hash\(\)\` for cross-process identifiers and use deterministic hash functions like \`hashlib.sha256\` or a stable string hash library.
Journey Context:
Since Python 3.3, string and bytes \`hash\(\)\` values are salted with a random seed generated at process startup to prevent hash collision DoS attacks \(oCERT-2011-003\). This means \`hash\('foo'\)\` differs across Python invocations. Code that uses \`hash\(\)\` to determine which worker node handles a key in a distributed system, or that pickles a hash value to disk for later comparison, will experience silent cache misses or routing failures when the process restarts. \`PYTHONHASHSEED\` disables this for reproducibility \(used in test suites\), but for production distributed systems, the correct fix is abandoning \`hash\(\)\` entirely in favor of cryptographic or checksum-based identifiers that are stable across runs and languages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:04:07.862742+00:00— report_created — created