How to identify an ODF file?

大憨熊 提交于 2019-12-23 23:29:40

问题


I need to be able to identify that a given file is an ODF file based on the contents of the file, and not on the file's extension.

ODF files are really a collection of XML files in a zip container, which means that I cannot use the file's magic number as it will just indicate that it is a zip file.

So what I'm really asking is are there any files that are required to be present in an ODF container? If so the presence of that file in a zip container indicates that it is likely to be an ODF file, and the absence of that file indicates that it definitely is not an ODF file.


回答1:


Why not check out the ODF Technical Specification? The mimetype file listed there would probably be an ideal way to check (just look for the vnd.oasis.opendocument string in the mimetype).




回答2:


As I understand it, there will always be .xml file(s) in the root of the archive, and this/these xml files will always contain the string <office:document very near the beginning.

All those I have seen seem to contain a file called "content.xml" in the root, which does contain this string.

There are not so many applications writing ODF documents, and in the past, there was basically just one. So it shouldn't be too difficult to install some ancient version of OpenOffice, save a few files, and check that this rule applies as it does on current ODF files.

I would test with something like this on a batch of know ODF files, to check if it is reliable:

$ unzip -c $FILE content.xml | grep -q '<office:document' && echo yes || echo NO



回答3:


Read the Build ID - if missing, the document is not ODF.

oDoc = ThisComponent
If oDoc.BuildID = "" Then
    bIsNotODF = TRUE
Endif


来源:https://stackoverflow.com/questions/1817908/how-to-identify-an-odf-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!