Is it possible to add and commit files outside of a git repository?

问题

We have text files spreaded in all corner of the system, and we were planing to add all modifications made in these files to a git repository.

Every time a modification is made to these files, it is made by a script. So, we were planing to add new commands to that script to add the files to a git repository. But, these modifications are concurrent.

We could build a path to each file representing the original location from their original paths.

Is there possible to add these files to a git repository concurrently ?

Like an atomic operation joining add+commit and pointing to both: the external file path and its repository correspondent path. Something like:

git --user="Script1 <script1@localhost>" --git-dir=/home/repo/filescollection.git/.git add --external-path=/home/user1/file.txt --repo-path=home_user1_files.txt

回答1:

The answer is both no and yes.¹

If you plan to use only Git "porcelain" commands, it's pretty clearly "no", as these work with the concept of a (single) work-tree that holds all the normal-format files, plus one index (holding the current state of that work-tree and building the next commit). There is one HEAD file holding the notion of the current branch name. You need at least two separate porcelain commands, in this sequence:

git add <path>
git commit <arguments>

to update the (single) index from the (single) work-tree version of the file in <path>, then make a commit using that index and the current HEAD. Git will do some locking of the things it updates whle making the commit, but you need the add-then-commit sequence to appear atomic, so you need to pile your own locking atop these.

(This remains true even if you use --work-tree and/or --git-dir arguments to redirect various parts of various steps: the shared index has to remain stable between the "add" and "commit" steps.)

On the other hand, if you are willing to step outside the comfort of pure porcelain, you can get the commit itself done as an atomic entity—but you're still looking at a race of sorts, so you need to resolve that before the answer really changes from "no" to "yes". To see how this works we must take the git add and git commit steps apart.

First, git add is essentially git update-index. We can create a new, temporary, private index and populate it from some specific commit we choose:

commit_id=...insert some magic here, see below...
export GIT_INDEX_FILE=$(mktemp) # remember to clean it up later too
git read-tree $commit_id

Now we can replace any given file within that index using git update-index (or in fact, the more familiar and comfortable git add: the environment variable works there too). Because this is our own private index, it is insulated from all other processes that may be modifying any other index.

Now we can do the steps that git commit does:

tree_id=$(git write-tree)

This turns the index—which is now our temporary index—into a new top level tree, with sub-trees for any sub-directories, all based on what we read into the index earlier (with git read-tree) and updated (with git update-index or git add). This top level tree, and any necessary sub-trees that were not already in the repository, is now stored in the repository. The new object is safe from automatic git gc for the configured expiration time (default 14 days), so this is how long we have to finish our commit. The command prints the new tree's ID to its standard output, which we capture in the $tree_id variable.

Next, we need to write a commit object, referring to the tree we just made, with an appropriate parent hash. The correct parent hash is obviously $commit_id. We must construct a commit message and then run:

new=$(git commit-tree -p $commit_id $tree_id < message_file)

or similar. This writes the commit object into the repository and, just like git write-tree, prints the new object's ID, which we capture into $new. (Note that this step uses author and committer name and email, which you can supply as -c user.name=... and -c user.email=... arguments.)

Last, and most important, we're ready to record this new object somewhere. This is where we must resolve our race (each of the object-writing steps did its own locking to make sure that that part was appropriately atomic).

I assume you would like to store these under some branch name(s), and that these branch names may be both read and updated by other processes. (If they are read-only, never updated by anything else, we are now home free.) We have an atomic update operation, in the form of git update-ref:

git update-ref [-m <reason>] <refname> <newvalue> <oldvalue>

The optional -m <reason> part is stored in the reflog, if there is a reflog for this reference. (This step also uses user.name and user.email, so supply them here if desired.) The refname part is the full name of the reference, e.g., refs/heads/branch for branch branch. The newvalue part is the hash ID we want to store, and the oldvalue part—which we will supply to check for races—is the value we expect that branch name to store right now.

Now, assuming we are racing some other process, there are two possible cases:

We won the race: the tree we read, back at the start, is the tree that goes with the commit that's currently at the tip of the branch. Our commit is therefore ready to be added to the branch, in a straightforward linear fashion.

or:

We lost the race: the tree we read, back at the start, is valid, but the branch name now points to a newer commit. Our commit is therefore worthless, or needs to be put on a side branch, or something. We could start over and do the whole thing over again, if our commit is truly worthless: maybe this time we will win the race.

What to do about the "lost the race" case is up to you. But now we see where the "magic" comes from: the commit ID we want, when we start this whole process, is the current commit hash associated with the reference. So the "magic" is just:

commit_id=$(git rev-parse $refname)

which reads the current value of the reference (if it's a branch name, we may assume that the type of the underlying object is commit).

Since the update-ref step has its own atomicity (enforced via locking), that's where we get our atomicity. The question of what to do on failures, though, is the hard part. Remember also to consider, and do something about, failures at each intermediate step, e.g., if git rev-parse fails, or any of git read-tree or git write-tree or git commit-tree fail, as well.

¹Go not to the Elves for counsel, for they will say both no and yes.

回答2:

No, it's not possible. You may consider either creating one huge git repository, multiple git repositories, rethinking your file structure to contain it all in one directory (if possible), or using something like Docker to create an image of the entire computer.

来源：https://stackoverflow.com/questions/43104726/is-it-possible-to-add-and-commit-files-outside-of-a-git-repository

标签

git