I am in charge of several Excel files and SQL schema files. How should I perform better document version control on these files?
I need to know the part modified (di
I've been struggling with this exact problem for the last few days and have written a small .NET utility to extract and normalise Excel files in such a way that they're much easier to store in source control. I've published the executable here:
https://bitbucket.org/htilabs/ooxmlunpack/downloads/OoXmlUnpack.exe
..and the source here:
https://bitbucket.org/htilabs/ooxmlunpack
If there's any interest I'm happy to make this more configurable, but at the moment, you should put the executable in a folder (e.g. the root of your source repository) and when you run it, it will:
Clearly not all of these things are necessary, but the end result is a spreadsheet file that will still open in Excel, but which is much more amenable to diffing and incremental compression. Also, storing the extracted files as well makes it much more obvious in the version history what changes have been applied in each version.
If there's any appetite out there, I'm happy to make the tool more configurable since I guess not everyone will want the contents extracted, or possibly the values removed from formula cells, but these are both very useful to me at the moment.
In tests, a 2 MB spreadsheet 'unpacks' to 21 MB, but then I was able to store five versions of it with small changes between each, in a 1.9 MB Mercurial data file, and visualise the differences between versions effectively using Beyond Compare in text mode.
NB: although I'm using Mercurial, I read this question while researching my solution and there's nothing Mercurial-specific about the solution, should work fine for Git or any other VCS.