Git find modified files since <ref> from a shallow clone

我的未来我决定 提交于 2021-02-16 14:10:26

问题


I'm on a CI box running tests. To speed it up, I'm just doing a shallow clone:

git clone --depth 1 git@github.com:JoshCheek/some_repo.git

Assuming all the tests pass, I want to trigger the next step in the pipeline. What to trigger is based on which files changed between the last deployment (ref d123456) and the current ref I just tested (ref c123456). If I had done a normal clone, I could find out like this this:

git diff --name-only d123456 c123456

But my clone is shallow, so it doesn't know about those commits. I see that I can use git fetch --depth=n to get more of the history, but I only know the SHA, not the depth of the SHA. Here's a set of ways that could presumably answer this question:

# hypothetical remote diff
git diff --name-only origin/d123456 origin/c123456

# hypothetical ref based fetch
git fetch --shallow-through d123456
git diff --name-only d123456 c123456

# hypothetical way to find the depth I need
depth=`git remote depth-to d123456`
git fetch --depth "$depth"
git diff --name-only d123456 c123456

Otherwise it seems like I might have to write a loop and keep invoking --deepen until my history contains the commit. That seems painful (meaning annoying to write / maintain) and expensive (meaning slow, remember that the purpose of the shallow clone is to reduce this cost).


回答1:


Otherwise it seems like I might have to write a loop and keep invoking --deepen until my history contains the commit. That seems painful ...

It is painful (and slow, as you note a bit later).

Modern Git (since version 2.11) does have a new git fetch option:

--shallow-exclude=<revision>

    Deepen or shorten the history of a shallow repository to exclude commits reachable from a specified remote branch or tag. This option can be specified multiple times.

I have not tried this; it's not clear if it allows a hash ID (the tests use names) and in any case you would specify the parent(s) of the commit you want to deepen through, rather than the commit you want to obtain. But it might suffice.

(I really think a better method is to keep reference clones you can borrow-from.)




回答2:


There are several possible solutions to reduce the clone time and space besides shallow-clone.

1.git clone <url> -b <branch> --single-branch

This fetches only the data reachable by <branch>. Not so effective as --depth=1 but still better than a full clone. It works fine when the repo has many diverged branches.

2.git init;git fetch <url> <tag>

Similarily it fetches only the data reachable by <tag>.

3.Create and use a mirror repo.

git clone <url> --mirror -- /foo/mirror. /foo/mirror is the mirror repo. Suppose your CI system starts multiple instances simultaneously. Clone each via git clone <url> --reference=/foo/mirror -- <instanceN>. In each clone, only the data that can not be found in the mirror repo will be downloaded from the remote repo. You could delete instances to save the space when a job is done. But just keep and update the mirror repo by git fetch regularly based on the update frequency of the remote repo. Once a day in the mid-night, or once a week on Sunday for example.

4.Use git worktree.

Make a clone, keep it and update it first when each CI instance starts. Use git worktree to checkout revisions into different working trees for each instance.




回答3:


I hit the same problem and used this

git clone --shallow-since=<date>

I had to store not only the SHA of my last deployment but the date of my last deployment, but otherwise worked great.



来源:https://stackoverflow.com/questions/43793887/git-find-modified-files-since-ref-from-a-shallow-clone

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!