Uncompress OpenOffice files for better storage in version control

后端 未结 6 1346
天命终不由人
天命终不由人 2020-12-15 07:00

I\'ve heard discussion about how OpenOffice (ODF) files are compressed zip files of XML and other data. So making a tiny change to the file can potentially totally change th

相关标签:
6条回答
  • 2020-12-15 07:26

    You may consider to store documents in FODT-format - flat XML format.
    This is relatively new alternative solution available.

    Document is just stored unzipped.

    More info is available at https://wiki.documentfoundation.org/Libreoffice_and_subversion.

    0 讨论(0)
  • 2020-12-15 07:31

    First, version control system you want to use should support hooks which are invoked to transform file from version in repository to the one in working area, like for example clean / smudge filters in Git from gitattributes.

    Second, you can find such filter, instead of writing one yourself, for example rezip from "Management of opendocument (openoffice.org) files in git" thread on git mailing list (but see warning in "Followup: management of OO files - warning about "rezip" approach"),

    You can also browse answers in "Tracking OpenOffice files/other compressed files with Git" thread, or try to find the answer inside "[PATCH 2/2] Add keyword unexpansion support to convert.c" thread.

    Hope That Helps

    0 讨论(0)
  • 2020-12-15 07:36

    Here's another program I stumbled across: store_zippies_uncompressed by Mirko Friedenhagen.

    The wiki also shows how to integrate it with Mercurial.

    0 讨论(0)
  • 2020-12-15 07:40

    I've modified the python program in Craig McQueen's answer just a bit. Changes include:

    • Actually checking the return of testZip (according to the docs, it appears that the original program will happily proceed with a corrupt zip file past the checkzip step).

    • Rewrite the for-loop to check for already-uncompressed files to be a single if-statement.

    Here is the new program:

    #!/usr/bin/python
    # Note, written for Python 2.6
    
    import sys
    import shutil
    import zipfile
    
    # Get a single command-line argument containing filename
    commandlineFileName = sys.argv[1]
    
    backupFileName = commandlineFileName + ".bak"
    inFileName = backupFileName
    outFileName = commandlineFileName
    checkFilename = commandlineFileName
    
    # Check input file
    # First, check it is valid (not corrupted)
    checkZipFile = zipfile.ZipFile(checkFilename)
    
    if checkZipFile.testzip() is not None:
        raise Exception("Zip file is corrupted")
    
    # Second, check that it's not already uncompressed
    if all(f.compress_type==zipfile.ZIP_STORED for f in checkZipFile.infolist()):
        raise Exception("File is already uncompressed")
    
    checkZipFile.close()
    
    # Copy to "backup" file and use that as the input
    shutil.copy(commandlineFileName, backupFileName)
    inputZipFile = zipfile.ZipFile(inFileName)
    
    outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED)
    
    # Copy each input file's data to output, making sure it's uncompressed
    for fileObject in inputZipFile.infolist():
        fileData = inputZipFile.read(fileObject)
        outFileObject = fileObject
        outFileObject.compress_type = zipfile.ZIP_STORED
        outputZipFile.writestr(outFileObject, fileData)
    
    outputZipFile.close()
    
    0 讨论(0)
  • 2020-12-15 07:47

    Here is a Python script that I've put together. It's had minimal testing so far. I've done basic testing in Python 2.6. But I prefer the idea of Python in general because it should abort with an exception if any error occurs, whereas a bash script may not.

    This first checks that the input file is valid and not already uncompressed. Then it copies the input file to a "backup" file with ".bak" extension. Then it uncompresses the original file, overwriting it.

    I'm sure there are things I've overlooked. Please feel free to give feedback.

    
    #!/usr/bin/python
    # Note, written for Python 2.6
    
    import sys
    import shutil
    import zipfile
    
    # Get a single command-line argument containing filename
    commandlineFileName = sys.argv[1]
    
    backupFileName = commandlineFileName + ".bak"
    inFileName = backupFileName
    outFileName = commandlineFileName
    checkFilename = commandlineFileName
    
    # Check input file
    # First, check it is valid (not corrupted)
    checkZipFile = zipfile.ZipFile(checkFilename)
    checkZipFile.testzip()
    
    # Second, check that it's not already uncompressed
    isCompressed = False
    for fileObject in checkZipFile.infolist():
        if fileObject.compress_type != zipfile.ZIP_STORED:
            isCompressed = True
    if isCompressed == False:
        raise Exception("File is already uncompressed")
    
    checkZipFile.close()
    
    # Copy to "backup" file and use that as the input
    shutil.copy(commandlineFileName, backupFileName)
    inputZipFile = zipfile.ZipFile(inFileName)
    
    outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED)
    
    # Copy each input file's data to output, making sure it's uncompressed
    for fileObject in inputZipFile.infolist():
        fileData = inputZipFile.read(fileObject)
        outFileObject = fileObject
        outFileObject.compress_type = zipfile.ZIP_STORED
        outputZipFile.writestr(outFileObject, fileData)
    
    outputZipFile.close()
    
    

    This is in a Mercurial repository in BitBucket.

    0 讨论(0)
  • 2020-12-15 07:48

    If you don't need the storage savings, but just want to be able to diff OpenOffice.org files stored in your version control system, you can use the instructions on the oodiff page, which tells how to make oodiff the default diff for OpenDocument formats under git and mercurial. (It also mentions SVN, but it's been so long since I used SVN regularly I'm not sure if those are instructions or limitations.)

    (I found this using Mirko Friedenhagen's page (cited by Craig McQueen above))

    0 讨论(0)
提交回复
热议问题