We have a number of git repositories which have grown to an unmanageable size due to the historical inclusion of binary test files and java .jar fi
Sort of. You can use Git's replace feature to set aside the big bloated history so that it is only downloaded if needed. It's like a shallow clone, but without a shallow clone's limitations.
The idea is you reboot a branch by creating a new root commit, then cherry-pick the old branch's tip commit. Normally you would lose all of the history this way (which also means you don't have to clone those big .jar files), but if the history is needed you can fetch the historical commits and use git replace to seamlessly stitch them back in.
See Scott Chacon's excellent blog post for a detailed explanation and walk-through.
Advantages of this approach:
.jars and everything, you still can.Disadvantages of this approach:
This approach still has some of the same problems as rewriting history. For example, if your new repository looks like this:
* modify bar (master)
|
* modify foo <--replace--> * modify foo (historical/master)
| |
* instructions * remove all of the big .jar files
|
* add another jar
|
* modify a jar
|
and someone has an old branch off of the historical branch that they merge in:
* merge feature xyz into master (master)
|\__________________________
| \
* modify bar * add feature xyz
| |
* modify foo <--replace--> * modify foo (historical/master)
| |
* instructions * remove all of the big .jar files
|
* add another jar
|
* modify a jar
|
then the big historical commits will reappear in your main repository and you're back to where you started. Note that this is no worse than rewriting history—someone might accidentally merge in the pre-rewrite commits.
This can be mitigated by adding an update hook in your shared repository to reject any pushes that would reintroduce the historical root commit(s).