Does anyone have an example of using git hash-object on a directory? It works easily enough on a file* but doesn\'t work as I\'d expect for a directory**
*:
I had the same problem and hacked up a Python script to hash a complete directory. It's limited in the sense that it doesn't take the .gitignore
file into account, but it's serves its purpose so far (hash directory, make commit object, store it on the gh-pages branch).
git hash-object -t tree
is expecting the file parameter to be a file that describes the entries in the tree, rather than a directory in the filesystem. I understand from the comment here that this command is expecting a file that describes the tree in a binary format, and that it would be easier to use git mktree
for you to create the tree object.
git mktree
understands input of the format you get from (for example) git ls-tree HEAD
. There is a nice example of constructing a tree from scratch using git hash-object
and git mktree
in the Git Community Book.
as Mark Longair said, mktree is the way to go.
I had the same problem and had to struggle a lot to fix it. This is what I did:
git ls-files -s directory_path
This will give you a list of the contents of the directory with its hashes.
You can then turn this list into ls-tree format in a text editor and
echo -e "{ls-tree format list}" | git mkdir
I'm not sure about getting the hash for a directory outside of a git repository, but for a directory inside of a repository, try this to print only the hash:
git rev-parse HEAD:some/directory
There is no need to use other commands that require additional processing.
This will also work but provides additional information you may not want (such as the file mode and other data):
git ls-tree HEAD some/directory
Depending why you wish to do this, the following git command might be useful:
git ls-files -s somedirectory | git hash-object --stdin
This give a single hash which takes into account the filenames and contents.
It works like this. The git ls-files -s ....
outputs a list of files and their hashes as text to stdout
, then git hash-object
generates a hash for the data it receives from stdin
.
My use case for this is the following - I want to know whether the (git managed) files in a directory in one branch exactly(*) match those in another branch. The specific use is to compare the "directory hashes" decide whether I need to re-generate derived files which are cached.
By default git ls-files
will list files in sub-directories too. If you don't want that, try looking at answers to "how to git ls-file for just one directory level. There are also various other options to git ls-files, including the ability to specify a list of files to include.
(*) excluding hash-collisions
I'd like to improve on @Fred Foo answer, by providing a modified version of his script, which differs in that it does not store the files and directories in the repository as a side effect of computing their hashes: http://pastebin.com/BSNGqsqC
Unfortunately I am not aware of any way to force git mktree
to not create a tree object in the repository, so the code has to generate a binary representation of the tree and pass it to git hash-object -t tree
.
This script is based also on answers from What is the internal format of a git tree object?
The general idea is to use git hash-object -- data.txt
to get hash of a file, and to use git hash-object --stdin -t tree < TreeDescription
for a directory, where:
"mode name\0hash"
mode
is "100644"
for files, and "40000"
for directories (note the lack of leading zero in case of directory)mode
and name
are separated by a single space,name
and hash
are separated by a single byte \0
hash
is a 20-bytes long binary representation of object hashname
, which seems not entirely necessary to create a tree object, but helps to determine if two directories are equivalent by comparing their hashes - unfortunately I am not aware which sorting algorithm should be used here (in particular: what to do in case of non-ascii characters)Also note that this binary format differs a little bit from the way a tree object is stored in the repository in that it lacks the "tree SIZE\0"
header.
Obviously you have to compute this bottom-up, starting from deepest files, as you need hashes of all children before computing the hash of a parent.