GIT Split Repository directory preserving move / renames history

后端未结

关注

 5  890

Let\'s say you have the repository:

myCode/megaProject/moduleA
myCode/megaProject/moduleB

Over time (months), you re-organise the project.

相关标签:

5条回答

猫巷女王i

2020-12-15 08:55
I'm aware of no simple way to do this, but it can be done.

The problem with filter-branch is that it works by

applying custom filters on each revision

If you can create a filter which won't delete your files they will be tracked between directories. Of course this is likely to be non-trivial for any repository which isn't trivial.

To start: Let's assume it is a trivial repository. You have never renamed a file, and you have never had files in two modules with the same name. All you need to do is get a list of the files in your module find megaProject/moduleA -type f -printf "%f\n" > preserve and then run your filter using those filenames, and your directory:

preserve.sh
```
cmd="find . -type f ! -name d1"
while read f; do
  cmd="$cmd ! -name $f"
done < /path/to/myCode/preserve
for i in $($cmd)
do
  rm $i
done
```
git filter-branch --prune-empty --tree-filter '/path/to/myCode/preserve.sh' HEAD

Of course it's renames that make this difficult. One of the nice things that git filter-branch does is gives you the $GIT_COMMIT environment variable. You can then get fancy and use things like:
```
for f in megaProject/moduleA
do
 git log --pretty=format:'%H' --name-only --follow -- $f |  awk '{ if($0 != ""){ printf $0 ":"; next; } print; }'
done > preserve
```
to build a filename history, with commits, that could be used in place of the simple preserve file in the trivial example, but the onus is going to be on you to keep track of what files should be present at each commit. This actually shouldn't be too hard to code out, but I haven't seen anybody who's done it yet.
0 讨论(0)
发布评论:

提交评论
- 加载中...

野趣味

2020-12-15 09:02

This is a version based on @rksawyer's scripts, but it uses git-filter-repo instead. I found it was much easier to use and much much faster than git-filter-branch.

# This script should run in the same folder as the project folder is.
# This script uses git-filter-repo (https://github.com/newren/git-filter-repo).
# The list of files and folders that you want to keep should be named <your_repo_folder_name>_KEEP.txt. I should contain a line end in the last line, otherwise the last file/folder will be skipped.
# The result will be the folder called <your_repo_folder_name>_REWRITE_CLONE. Your original repo won't be changed.
# Tags are not preserved, see line below to preserve tags.
# Running subsequent times will backup the last run in <your_repo_folder_name>_REWRITE_CLONE_BKP.

# Define here the name of the folder containing the repo: 
GIT_REPO="git-test-orig"

clone="$GIT_REPO"_REWRITE_CLONE
temp=/tmp/git_rewrite_temp
rm -Rf "$clone"_BKP
mv "$clone" "$clone"_BKP
rm -Rf "$temp"
mkdir "$temp"
git clone "$GIT_REPO" "$clone"
cd "$clone"
git remote remove origin
open .
open "$temp"

# Comment line below to preserve tags
git tag | xargs git tag -d

echo 'Start logging file history...'
echo "# git log results:\n" > "$temp"/log.txt

while read p
do
    shopt -s dotglob
    find "$p" -type f > "$temp"/temp
    while read f
    do
        echo "## " "$f" >> "$temp"/log.txt
        # print every file and follow to get any previous renames
        # Then remove blank lines.  Then remove every other line to end up with the list of filenames       
        git log --pretty=format:'%H' --name-only --follow -- "$f" | awk 'NF > 0' | awk 'NR%2==0' | tee -a "$temp"/log.txt

        echo "\n\n" >> "$temp"/log.txt
    done < "$temp"/temp
done < ../"$GIT_REPO"_KEEP.txt > "$temp"/PRESERVE

mv "$temp"/PRESERVE "$temp"/PRESERVE_full
awk '!a[$0]++' "$temp"/PRESERVE_full > "$temp"/PRESERVE

sort -o "$temp"/PRESERVE "$temp"/PRESERVE

echo 'Starting filter-branch --------------------------'
git filter-repo --paths-from-file "$temp"/PRESERVE --force --replace-refs delete-no-add
echo 'Finished filter-branch --------------------------'

It logs the result of git log into a file in /tmp/git_rewrite_temp/log.txt, so you can get rid of these lines if you don't need a log.txt and want it to run faster.

0 讨论(0)

余生分开走

2020-12-15 09:10

We painted ourselves into a much worse corner, with dozens of projects across dozens of branches, with each project dependent on 1-4 others, and 56k commits total. filter-branch was taking up to 24 hours just to split a single directory off.

I ended up writing a tool in .NET using libgit2sharp and raw file system access to split an arbitrary number of directories per project, and only preserve relevant commits/branches/tags for each project in the new repos. Instead of modifying the source repo, it writes out N other repos with only the configured paths/refs.

You're welcome to see if this suits your needs, modify it, etc. https://github.com/CurseStaff/GitSplit

0 讨论(0)
发布评论:

提交评论
- 加载中...
北恋

2020-12-15 09:11
Running git filter-branch --subdirectory-filter in your cloned repository will remove all commits that don't affect content in that subdirectory, which includes those affecting the files before they were moved.

Instead, you need to use the --index-filter flag with a script to delete all files you're not interested in, and the --prune-empty flag to ignore any commits affecting other content.

There's a blog post from Kevin Deldycke with a good example of this:
```
git filter-branch --prune-empty --tree-filter 'find ./ -maxdepth 1 -not -path "./e107*" -and -not -path "./wordpress-e107*" -and -not -path "./.git" -and -not -path "./" -print -exec rm -rf "{}" \;' -- --all
```
This command effectively checks out each commit in turn, deletes all uninteresting files from the working directory and, if anything has changed from the last commit then it checks it in (rewriting the history as it goes). You would need to tweak that command to delete all files except those in, say, /moduleA, /megaProject/moduleA and the specific files you want to keep from /megaProject.
0 讨论(0)
发布评论:

提交评论
- 加载中...

忘了有多久

2020-12-15 09:16

Following on to the answer above. First iterate through all of the files in the directory that is being kept using git log --follow to git the old paths/names from prior moves/renames. Then use filter-branch to iterate through every revision removing any files that were not on the list created in step 1.

#!/bin/bash
DIRNAME=dirD

# Catch all files including hidden files
shopt -s dotglob
for f in $DIRNAME/*
do
# print every file and follow to get any previous renames
# Then remove blank lines.  Then remove every other line to end up with the list of filenames
 git log --pretty=format:'%H' --name-only --follow -- $f | awk 'NF > 0' | awk 'NR%2==0'
done > /tmp/PRESERVE

sort -o /tmp/PRESERVE /tmp/PRESERVE
cat /tmp/PRESERVE

Then create a script (preserve.sh) that filter-branch will call for each revision.

#!/bin/bash
DIRNAME=dirD

# Delete everything that's not in the PRESERVE list
echo 'delete this files:'
cmd=`find . -type f -not -path './.git/*' -not -path './$DIRNAME/*'`
echo $cmd > /tmp/ALL


# Convert to one filename per line and remove the lead ./
cat /tmp/ALL | awk '{NF++;while(NF-->1)print $NF}' | cut -c3- > /tmp/ALL2
sort -o /tmp/ALL2 /tmp/ALL2

#echo 'before:'
#cat /tmp/ALL2

comm -23 /tmp/ALL2 /tmp/PRESERVE > /tmp/DELETE_THESE
echo 'delete these:'
cat /tmp/DELETE_THESE
#exit 0

while read f; do
  rm $f
done < /tmp/DELETE_THESE

Now use filter-branch, if all files are removed in the revision, then prune that commit and it's message.

 git filter-branch --prune-empty --tree-filter '/FULL_PATH/preserve.sh' master

0 讨论(0)

GIT Split Repository directory preserving *move / renames* history

GIT Split Repository directory preserving move / renames history