I\'ve heard discussion about how OpenOffice (ODF) files are compressed zip files of XML and other data. So making a tiny change to the file can potentially totally change th
You may consider to store documents in FODT-format - flat XML format.
This is relatively new alternative solution available.
Document is just stored unzipped.
More info is available at https://wiki.documentfoundation.org/Libreoffice_and_subversion.
First, version control system you want to use should support hooks which are invoked to transform file from version in repository to the one in working area, like for example clean / smudge filters in Git from gitattributes.
Second, you can find such filter, instead of writing one yourself, for example rezip from "Management of opendocument (openoffice.org) files in git" thread on git mailing list (but see warning in "Followup: management of OO files - warning about "rezip" approach"),
You can also browse answers in "Tracking OpenOffice files/other compressed files with Git" thread, or try to find the answer inside "[PATCH 2/2] Add keyword unexpansion support to convert.c" thread.
Hope That Helps
Here's another program I stumbled across: store_zippies_uncompressed by Mirko Friedenhagen.
The wiki also shows how to integrate it with Mercurial.
I've modified the python program in Craig McQueen's answer just a bit. Changes include:
Actually checking the return of testZip (according to the docs, it appears that the original program will happily proceed with a corrupt zip file past the checkzip step).
Rewrite the for-loop to check for already-uncompressed files to be a single if-statement.
Here is the new program:
#!/usr/bin/python
# Note, written for Python 2.6
import sys
import shutil
import zipfile
# Get a single command-line argument containing filename
commandlineFileName = sys.argv[1]
backupFileName = commandlineFileName + ".bak"
inFileName = backupFileName
outFileName = commandlineFileName
checkFilename = commandlineFileName
# Check input file
# First, check it is valid (not corrupted)
checkZipFile = zipfile.ZipFile(checkFilename)
if checkZipFile.testzip() is not None:
raise Exception("Zip file is corrupted")
# Second, check that it's not already uncompressed
if all(f.compress_type==zipfile.ZIP_STORED for f in checkZipFile.infolist()):
raise Exception("File is already uncompressed")
checkZipFile.close()
# Copy to "backup" file and use that as the input
shutil.copy(commandlineFileName, backupFileName)
inputZipFile = zipfile.ZipFile(inFileName)
outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED)
# Copy each input file's data to output, making sure it's uncompressed
for fileObject in inputZipFile.infolist():
fileData = inputZipFile.read(fileObject)
outFileObject = fileObject
outFileObject.compress_type = zipfile.ZIP_STORED
outputZipFile.writestr(outFileObject, fileData)
outputZipFile.close()
Here is a Python script that I've put together. It's had minimal testing so far. I've done basic testing in Python 2.6. But I prefer the idea of Python in general because it should abort with an exception if any error occurs, whereas a bash script may not.
This first checks that the input file is valid and not already uncompressed. Then it copies the input file to a "backup" file with ".bak" extension. Then it uncompresses the original file, overwriting it.
I'm sure there are things I've overlooked. Please feel free to give feedback.
#!/usr/bin/python
# Note, written for Python 2.6
import sys
import shutil
import zipfile
# Get a single command-line argument containing filename
commandlineFileName = sys.argv[1]
backupFileName = commandlineFileName + ".bak"
inFileName = backupFileName
outFileName = commandlineFileName
checkFilename = commandlineFileName
# Check input file
# First, check it is valid (not corrupted)
checkZipFile = zipfile.ZipFile(checkFilename)
checkZipFile.testzip()
# Second, check that it's not already uncompressed
isCompressed = False
for fileObject in checkZipFile.infolist():
if fileObject.compress_type != zipfile.ZIP_STORED:
isCompressed = True
if isCompressed == False:
raise Exception("File is already uncompressed")
checkZipFile.close()
# Copy to "backup" file and use that as the input
shutil.copy(commandlineFileName, backupFileName)
inputZipFile = zipfile.ZipFile(inFileName)
outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED)
# Copy each input file's data to output, making sure it's uncompressed
for fileObject in inputZipFile.infolist():
fileData = inputZipFile.read(fileObject)
outFileObject = fileObject
outFileObject.compress_type = zipfile.ZIP_STORED
outputZipFile.writestr(outFileObject, fileData)
outputZipFile.close()
This is in a Mercurial repository in BitBucket.
If you don't need the storage savings, but just want to be able to diff OpenOffice.org files stored in your version control system, you can use the instructions on the oodiff page, which tells how to make oodiff the default diff for OpenDocument formats under git and mercurial. (It also mentions SVN, but it's been so long since I used SVN regularly I'm not sure if those are instructions or limitations.)
(I found this using Mirko Friedenhagen's page (cited by Craig McQueen above))