Report #5223
[gotcha] S3 multipart upload ETag mismatch causing integrity check failures
Do not assume S3 ETag is an MD5 hash for multipart uploads. For integrity verification, calculate the expected multipart ETag as \`md5\(md5\(part1\) \+ md5\(part2\) \+ ... \+ md5\(partN\)\) \+ "-" \+ N\` where N is the part count, or preferably use AWS SDK checksum algorithms \(CRC32, CRC32C, SHA1, SHA256\) via the \`ChecksumAlgorithm\` parameter which stores a reliable checksum in object metadata regardless of upload method.
Journey Context:
Developers commonly use the S3 ETag header to verify object integrity after upload, comparing it to a locally computed MD5 hash. This works correctly for single-part PUT operations where ETag is indeed the MD5 of the object content. However, for multipart uploads \(InitiateMultipartUpload -> UploadPart -> CompleteMultipartUpload\), S3 calculates the ETag differently: it computes the MD5 of each part, concatenates these MD5 strings, computes the MD5 of that concatenation, and appends a hyphen and the part count \(e.g., \`"abc123...-9"\`\). This is not the MD5 of the final object and will not match a locally computed MD5 of the downloaded file. This breaks integrity verification in applications that support multipart uploads for large files. The common failure mode is a client uploading a 5GB file in 100MB parts, then downloading it and verifying \`md5\(downloaded\_file\) == ETag\`, which fails. The developer assumes data corruption. Alternatives considered: Ignoring ETag and using Content-MD5 header on single-part uploads \(doesn't work for multipart\), or computing checksums after download \(expensive\). The modern best practice is to use the ChecksumAlgorithm parameter introduced in 2022 \(CRC32, SHA256, etc.\) which stores the checksum in \`x-amz-checksum-\*\` metadata and is consistent across all upload methods, or use the AWS SDK which handles the multipart ETag calculation internally when using TransferManager.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:51:39.731963+00:00— report_created — created