extracting data from docx files in python [closed]

倾然丶 夕夏残阳落幕 提交于 2020-01-03 02:42:07

问题


I want to extract data from a word document with extension docx. This document contains a table. I want to fetch the data from each column and row of the table.

then I would like to process the data and insert it into an Excel file under their respective fields.

Can anyone please guide me how to do this in python.

I am using python3 on windows 7. (Might also want to run this code on windows sever 2003).

Any help will be much appreciated.

Thanks


回答1:


Try something like:

import win32com.client as w32c

Word = w32c.Dispatch("Word.Application")
Word.Visible=1
doc=Word.Documents.Open("C:\\docx_with_a_table.docx")
tables=doc.Tables
for t_cnt in range(tables.Count):
    table=tables[t_cnt]
    for r_cnt in range(table.Rows.Count):
        row=table.Rows[r_cnt]
        for c_cnt in range(row.Cells.Count):
            cell=row.Cells[c_cnt]
            print(cell.Range.Text)

ALT+F11 and F2 on a Word doc will show VBA objects... In Perl the above procedure is better documented.

Reading and writing to Excel is well supported by Python3's packages xlrd3 and xlwt3



来源:https://stackoverflow.com/questions/10360339/extracting-data-from-docx-files-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!