Read word document (*.doc) content with tables etc

人走茶凉 提交于 2020-01-11 05:27:09

问题


I have a word document (2003). I am using Powershell to parse the content of the document. The document contains a few lines of text at the top, a dozen tables with differing number of columns and then some more text.

I expect to be able to read the document as something like the below:

  1. Read document (make necessary objects etc)
  2. Get each line of text
  3. If not part of a table, process as text and Write-Output
  4. else
  5. If part of a table
  6. Get table number (by order) and parse output based on columns
  7. end if

Below is the powershell script that I have begun to write:

$objWord = New-Object -Com Word.Application
$objWord.Visible = $false
$objDocument = $objWord.Documents.Open($filename)
$paras = $objDocument.Paragraphs
foreach ($para in $paras) 
{ 
    Write-Output $para.Range.Text
}

I am not sure if Paragraphs is what I want. Is there anything more suitable for my purpose? All I am getting now is the entire content of the document. How do I control what I get. Like I want to get a line, be able to determine if it is part of a table or not and take an action based on what number table it is.


回答1:


You can enumerate the tables in a Word document via the Tables collection. The Rows and Columns properties will allow you to determine the number of rows/columns in a given table. Individual cells can be accessed via the Cell object.

Example that will print the value of the cell in the last row and last column of each table in the document:

$wd = New-Object -ComObject Word.Application
$wd.Visible = $true
$doc = $wd.Documents.Open($filename)
$doc.Tables | ForEach-Object {
  $_.Cell($_.Rows.Count, $_.Columns.Count).Range.Text
}


来源:https://stackoverflow.com/questions/13105142/read-word-document-doc-content-with-tables-etc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!