Report #78117
[tooling] Rewriting Git history \(removing large files, splitting a subdirectory into a new repo, removing sensitive data\) using \`git filter-branch\` is excruciatingly slow, memory-intensive, and prone to leaving backup refs that confuse users
Use \`git filter-repo\` \(install via pip/pacman/brew\). To extract a subdirectory to a new repo: \`git filter-repo --path src/subdir --path-rename src/subdir:\`. To remove files larger than 10MB: \`git filter-repo --strip-blobs-bigger-than 10M\`. To remove sensitive strings: \`git filter-repo --replace-text <\(echo 'secret\_key==>REMOVED'\)\`. Always run from a fresh clone \(\`git clone --mirror\`\) and verify with \`git log --stat\` before force-pushing.
Journey Context:
\`git filter-branch\` is a Perl script that checks out every commit into a working directory, applies filters, and commits, resulting in O\(n\*m\) complexity where n is commits and m is filter complexity. It requires \`--tag-name-filter cat --prune-empty\` incantations and leaves \`refs/original/\` backups that cause 'ref already exists' errors on reruns. \`filter-repo\` is a Python 3 rewrite that operates directly on Git's object database using fast-import/fast-export streams, achieving 10-100x speedup. It automatically handles tag rewrites, remotes, and reflog cleanup. Critical safety: \`filter-repo\` refuses to run on non-fresh clones \(unless \`--force\` is passed\) to prevent destroying uncommitted work. Unlike \`filter-branch\`, it generates a \`commit-map\` file showing old-to-new SHA mappings for CI/CD pipeline updates. The \`--path\` filtering is inclusive \(keeps only specified paths\), while \`--invert-paths\` removes specific paths.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:42:51.698340+00:00— report_created — created