Extract DOCX Comments

前端 未结 3 608
太阳男子
太阳男子 2020-12-20 03:04

I\'m a teacher. I want a list of all the students who commented on the essay I assigned, and what they said. The Drive API stuff was too challenging for me, but I figured I

3条回答
  •  醉话见心
    2020-12-20 03:55

    You got remarkably far considering that OOXML is such a complex format.

    Here's some sample Python code showing how to access the comments of a DOCX file via XPath:

    from lxml import etree
    import zipfile
    
    ooXMLns = {'w':'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}
    
    def get_comments(docxFileName):
      docxZip = zipfile.ZipFile(docxFileName)
      commentsXML = docxZip.read('word/comments.xml')
      et = etree.XML(commentsXML)
      comments = et.xpath('//w:comment',namespaces=ooXMLns)
      for c in comments:
        # attributes:
        print(c.xpath('@w:author',namespaces=ooXMLns))
        print(c.xpath('@w:date',namespaces=ooXMLns))
        # string value of the comment:
        print(c.xpath('string(.)',namespaces=ooXMLns))
    

提交回复
热议问题