How to read contents of an Table in MS-Word file Using Python?

匿名 (未验证) 提交于 2019-12-03 02:14:01

问题:

How can I read and process contents of every cell of a table in a DOCX file?

I am using Python 3.2 on Windows 7 and PyWin32 to access the MS-Word Document.

I am a beginner so I don't know proper way to reach to table cells. So far I have just done this:

import win32com.client as win32 word = win32.gencache.EnsureDispatch('Word.Application') word.Visible = False  doc = word.Documents.Open("MyDocument") 

回答1:

Here is what works for me in Python 2.7:

import win32com.client as win32 word = win32.Dispatch("Word.Application") word.Visible = 0 word.Documents.Open("MyDocument") doc = word.ActiveDocument 

To see how many tables your document has:

doc.Tables.Count 

Then, you can select the table you want by its index. Note that, unlike python, COM indexing starts at 1:

table = doc.Tables(1) 

To select a cell:

table.Cell(Row = 1, Column= 1) 

To get its content:

table.Cell(Row =1, Column =1).Range.Text 

Hope that this helps.

EDIT:

An example of a function that returns Column index based on its heading:

def Column_index(header_text): for i in range(1 , table.Columns.Count+1):     if table.Cell(Row = 1,Column = i).Range.Text == header_text:         return i 

then you can access the cell you want this way for example:

table.Cell(Row =1, Column = Column_index("The Column Header") ).Range.Text 


回答2:

Jumping in rather late in life, but thought I'd put this out anyway: Now (2015), you can use the pretty neat doc python library: https://python-docx.readthedocs.org/en/latest/. And then:

from docx import Document  wordDoc = Document('')  for table in wordDoc.tables:     for row in table.rows:         for cell in row.cells:             print cell.text 


回答3:

I found a simple code snippet on a blog Reading Table Contents Using Python by etienne

The great thing about this is that you don't need any non-standard python libraries installed.

The format of a docx file is described at Open Office XML.

import zipfile import xml.etree.ElementTree  WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' PARA = WORD_NAMESPACE + 'p' TEXT = WORD_NAMESPACE + 't' TABLE = WORD_NAMESPACE + 'tbl' ROW = WORD_NAMESPACE + 'tr' CELL = WORD_NAMESPACE + 'tc'  with zipfile.ZipFile('') as docx:     tree = xml.etree.ElementTree.XML(docx.read('word/document.xml'))  for table in tree.iter(TABLE):     for row in table.iter(ROW):         for cell in row.iter(CELL):             print ''.join(node.text for node in cell.iter(TEXT)) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!