How to retrieve the author of an office file in python?

后端 未结 4 1009
情话喂你
情话喂你 2020-11-30 12:58

Title explains the problem, there are doc and docs files that which I want to retrieive their author information so that I can restructure my files.

os.stat

4条回答
  •  盖世英雄少女心
    2020-11-30 13:37

    Since docx files are just zipped XML you could just unzip the docx file and presumably pull the author information out of an XML file. Not quite sure where it'd be stored, just looking around at it briefly leads me to suspect it's stored as dc:creator in docProps/core.xml.

    Here's how you can open the docx file and retrieve the creator:

    import zipfile, lxml.etree
    
    # open zipfile
    zf = zipfile.ZipFile('my_doc.docx')
    # use lxml to parse the xml file we are interested in
    doc = lxml.etree.fromstring(zf.read('docProps/core.xml'))
    # retrieve creator
    ns={'dc': 'http://purl.org/dc/elements/1.1/'}
    creator = doc.xpath('//dc:creator', namespaces=ns)[0].text
    

提交回复
热议问题