How to extract text from an existing docx file using python-docx

后端 未结 7 1123
不思量自难忘°
不思量自难忘° 2020-11-27 15:59

I\'m trying to use python-docx module (pip install python-docx) but it seems to be very confusing as in github repo test sample they are using

7条回答
  •  一向
    一向 (楼主)
    2020-11-27 16:32

    Using python-docx, as @Chinmoy Panda 's answer shows:

    for para in doc.paragraphs:
        fullText.append(para.text)
    

    However, para.text will lost the text in w:smarttag (Corresponding github issue is here: https://github.com/python-openxml/python-docx/issues/328), you should use the following function instead:

    def para2text(p):
        rs = p._element.xpath('.//w:t')
        return u" ".join([r.text for r in rs])
    

提交回复
热议问题