Git/rsync mix for projects with large binaries and text files

问题

Is anyone aware of a project that can effectively combine git version control for text-based files and something like rsync for large binary files (like data)? Obviously, this is a little beyond what a DVCS should do, but I am curious if anyone has written a smart wrapper around git to do such things to sync with a central repository.

回答1:

You might like git-annex. From its homepage:

git-annex allows managing files with git, without checking the file contents into git. While that may seem paradoxical, it is useful when dealing with files larger than git can currently easily handle, whether due to limitations in memory, time, or disk space.

Even without file content tracking, being able to manage files with git, move files around and delete files with versioned directory trees, and use branches and distributed clones, are all very handy reasons to use git. And annexed files can co-exist in the same git repository with regularly versioned files, which is convenient for maintaining documents, Makefiles, etc that are associated with annexed files but that benefit from full revision control.

回答2:

The last one I saw is called bup:

bup is a program that backs things up. It's short for "backup."

bup has a few advantages over other backup software:

It uses a rolling checksum algorithm (similar to rsync) to split large files into chunks. The most useful result of this is you can backup huge virtual machine (VM) disk images, databases, and XML files incrementally, even though they're typically all in one huge file, and not use tons of disk space for multiple versions.

It uses the packfile format from git (the open source version control system), so you can access the stored data even if you don't like bup's user interface.

Unlike git, it writes packfiles directly (instead of having a separate garbage collection / repacking stage) so it's fast even with gratuitously huge amounts of data. bup's improved index formats also allow you to track far more filenames than git (millions) and keep track of far more objects (hundreds or thousands of gigabytes).

Data is "automagically" shared between incremental backups without having to know which backup is based on which other one - even if the backups are made from two different computers that don't even know about each other. You just tell bup to back stuff up, and it saves only the minimum amount of data needed.

来源：https://stackoverflow.com/questions/11281678/git-rsync-mix-for-projects-with-large-binaries-and-text-files

标签

git

rsync

large-files