Remove history for everything except a list of files using git filter-branch

南楼画角 提交于 2020-05-23 07:11:06

问题


I'm trying to move some files between two git repositories repo1 and repo2. I have a short list of files I'd like to move (preserving history).

Three files to move from repo1:

libraryname/file1
libraryname/file2
tests/libraryname/file3

There are other files in libraryname/ and tests/libraryname/. There are other folders in / and tests/

My plan is to checkout repo1, then modify the history tree until it only contains history for the files I'm interested in. Then checkout repo2, and merge in the output of the previous operation. It seems like git filter-branch is the right tool for the first step.

So far I've tried git filter-branch --index-filter 'git rm -r --cached <FILES>' Where <FILES> lists every unwanted whole folder or file.

But this leaves a lot of folders which no longer exist at HEAD, but have existed at some point in this repositories lifetime. It seems quite tedious to figure out everything that has existed in the history of this repo - there must be a better way

How do I end up with a git commit tree which only includes these three files? Is there a better way then I'm suggesting? Or, is there a way to remove traces of all files which don't currently exist at HEAD?


回答1:


You said it leaves behind folders; I assume you mean it leaves behind files in those folders (because git doesn't preserve empty folders)...

It seems like you might want to take the approach of clearing the index and then re-adding the entries you want.

git filter-branch ...
    --index-filter 'git rm -r --cached * && git reset $GIT_COMMIT -- libraryname/file1 libraryname/file2 tests/libraryname/file3
    ...

Since you're thinning out the content so much, don't forget that you may want to include a --prune-empty option




回答2:


With Git 2.24 (Q4 2019), git filter-branch is deprecated.

The equivalent would be, using newren/git-filter-repo, and its example section:

If you have a long list of files, directories, globs, or regular expressions to filter on, you can stick them in a file and use --paths-from-file; for example, with a file named stuff-i-want.txt with contents of

README.md
guides/
tools/releases
glob:*.py
regex:^.*/.*/[0-9]{4}-[0-9]{2}-[0-9]{2}.txt$
tools/==>scripts/
regex:(.*)/([^/]*)/([^/]*)\.text$==>\2/\1/\3.txt

then you could run

git filter-repo --paths-from-file stuff-i-want.txt

In your case, stuff-i-want.txt would be:

libraryname/file1
libraryname/file2
tests/libraryname/file3



回答3:


Here is a whitelist-based approach which might be faster (because it only needs to compare whole lines of pre-sorted lists) and easier if a large number of files is involved.

  1. Create a sorted list of all files in all commits of your branch:

    $ export LC_COLLATE=C whitelist="$(mktemp)" && git log --name-status | sed 's/^[A-Z][[:space:]]\{1,\}//; t; d' | sort -u > "$whitelist"

  2. Edit that list with your favorite text editor and remove all files which are not of interest for keeping, i. e. create a white list of files to keep.

    $ "$EDITOR" -- "$whitelist" # remove from list what you don't want to keep

  3. Perform the actual filter operation:

    $ git filter-branch -f --index-filter 'git ls-files -c | sort | comm -23 -- - "$whitelist" | while IFS= read -r f; do git rm --cached -- "$f"; done' --prune-empty

  4. Remove the white list once the filter operation worked without problems.

    $ rm -- "$whitelist" && unset LC_COLLATE whitelist



来源:https://stackoverflow.com/questions/45633033/remove-history-for-everything-except-a-list-of-files-using-git-filter-branch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!