Is it possible to slim a .git repository without rewriting history?

后端 未结 4 637
礼貌的吻别
礼貌的吻别 2020-12-15 05:24

We have a number of git repositories which have grown to an unmanageable size due to the historical inclusion of binary test files and java .jar fi

相关标签:
4条回答
  • 2020-12-15 05:48

    Sort of. You can use Git's replace feature to set aside the big bloated history so that it is only downloaded if needed. It's like a shallow clone, but without a shallow clone's limitations.

    The idea is you reboot a branch by creating a new root commit, then cherry-pick the old branch's tip commit. Normally you would lose all of the history this way (which also means you don't have to clone those big .jar files), but if the history is needed you can fetch the historical commits and use git replace to seamlessly stitch them back in.

    See Scott Chacon's excellent blog post for a detailed explanation and walk-through.

    Advantages of this approach:

    • History is not modified. If you need to go back to an older commit complete with it's big .jars and everything, you still can.
    • If you don't need to look at the old history, the size of your local clone is nice and small, and any fresh clones you make won't require downloading tons of mostly-useless data.

    Disadvantages of this approach:

    • The complete history is not available by default—users need to jump through some hoops to get at the history.
    • If you do need frequent access to the history, you'll end up downloading the bloated commits anyway.
    • This approach still has some of the same problems as rewriting history. For example, if your new repository looks like this:

      * modify bar (master)
      |
      * modify foo  <--replace-->  * modify foo (historical/master)
      |                            |
      * instructions               * remove all of the big .jar files
                                   |
                                   * add another jar
                                   |
                                   * modify a jar
                                   |
      

      and someone has an old branch off of the historical branch that they merge in:

      * merge feature xyz into master (master)
      |\__________________________
      |                           \
      * modify bar                 * add feature xyz
      |                            |
      * modify foo  <--replace-->  * modify foo (historical/master)
      |                            |
      * instructions               * remove all of the big .jar files
                                   |
                                   * add another jar
                                   |
                                   * modify a jar
                                   |
      

      then the big historical commits will reappear in your main repository and you're back to where you started. Note that this is no worse than rewriting history—someone might accidentally merge in the pre-rewrite commits.

      This can be mitigated by adding an update hook in your shared repository to reject any pushes that would reintroduce the historical root commit(s).

    0 讨论(0)
  • 2020-12-15 05:48

    I don't know of a solution which would avoid rewriting the history.

    In that case, cleaning the rpeo with a tool like BFG- repo cleaner is the easiest solution (easier that git filter-branch).

    0 讨论(0)
  • 2020-12-15 05:54

    I honestly can't think of a way to do that. If you think about what Git "promises" you as a user, with regards to data integrity, I can't think of a way you could remove a file from the repository and keep the same hash. In other words, if what you're asking were possible, then Git would be a lot less reliable...

    0 讨论(0)
  • 2020-12-15 05:59

    No, that is not possible – You will have to rewrite history. But here are some pointers for that:

    • As VonC mentioned: If it fits your scenario, use BFG- repo cleaner – it’s a lot easier to use than git filter-branch.
    • You do not need to clone again! Just run these commands instead of git pull and you will be fine (replace origin and master with your remote and branch):

      git fetch origin
      git reset --hard origin/master
      

      But note that unlike git pull, you will loose all the local changes that are not pushed to the server yet.

    • It helps a lot if you (or somebody else in you team) fully understand how git sees history, and what git pull, git merge and git rebase (also as git rebase --onto) do. Then give everybody involved a quick training on how to handle this rewrite situation (5-10 mins should be enough, the basic dos and don’ts).
    • Be aware that git filter-branch does not cause any harm in itself, but causes a lot of standard workflows to cause harm. If people don’t act accordingly and merge old history, you might just have to rewrite history again if you don’t notice soon enough.
    • You can prevent people from merging (more precisely pushing) the old history by writing (5 lines) an appropriate update hook on the server. Just check whether the history of the pushed head contains a specific old commit.
    0 讨论(0)
提交回复
热议问题