Powershell search matching string in word document

≡放荡痞女 提交于 2019-12-10 08:46:50

问题


I have a simple requirement. I need to search a string in Word document and as result I need to get matching line / some words around in document.

So far, I could successfully search a string in folder containing Word documents but it returns True / False based on whether it could find search string or not.

#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path     = "c:\MORLAB"
$files    = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output   = "c:\wordfiletry.txt"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "CRHPCD01"

Function getStringMatch
{
  # Loop through all *.doc files in the $path directory
  Foreach ($file In $files)
  {
   $document = $application.documents.open($file.FullName,$false,$true)
   $range = $document.content
   $wordFound = $range.find.execute($findText)

   if($wordFound) 
    { 
     "$file.fullname has $wordfound" | Out-File $output -Append
    }

  }
$document.close()
$application.quit()
}

getStringMatch

回答1:


#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path     = "c:\Temp"
$files    = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output   = "c:\temp\wordfiletry.csv"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "First"
$charactersAround = 30
$results = @{}

Function getStringMatch
{
    # Loop through all *.doc files in the $path directory
    Foreach ($file In $files)
    {
        $document = $application.documents.open($file.FullName,$false,$true)
        $range = $document.content

        If($range.Text -match ".{$($charactersAround)}$($findtext).{$($charactersAround)}"){
             $properties = @{
                File = $file.FullName
                Match = $findtext
                TextAround = $Matches[0] 
             }
             $results += New-Object -TypeName PsCustomObject -Property $properties
        }
    }

    If($results){
        $results | Export-Csv $output -NoTypeInformation
    }

    $document.close()
    $application.quit()
}

getStringMatch

import-csv $output

There are a couple of ways to get what you want. A simple approach is since you have the text of the document already lets perform a regex match on it and return the results and more. This helps in trying to address getting some words around in document.

We have the variable $charactersAround which sets the number of characters to match around the $findtext. Also I though the output was a better fit for a CSV file so I used $results to capture a hashtable of properties that, in the end, are output to a csv file.

Be sure to change the variables for your own testing. Now that we are using regex to locate the matches this opens up a world of possibilities.

Sample Output

Match TextAround                                                        File                          
----- ----------                                                        ----                          
First dley Air Services Limited dba First Air meets or exceeds all term C:\Temp\20120315132117214.docx



回答2:


Thanks! You provided a great solution to use PowerShell regex expressions to look for information in a Word document. I needed to modify it to meet my needs. Maybe, it will help someone else. It reads each line of the word document, and then uses the regex expression to determine if the line is a match. The output could easily be modified or dumped to a log file.

    Set-StrictMode -Version latest
    $path = "c:\Temp\pii"
    $files    = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
    $application = New-Object -comobject word.application
    $application.visible = $False
    $findtext = "[0-9]" #regex

    Function getStringMatch
    {
    # Loop through all *.doc files in the $path directory

    Foreach ($file In $files) {
          $document = $application.documents.open($file.FullName,$false,$true)
          $arrContents = $document.content.text.split()
          $varCounter = 0
          ForEach ($line in $arrContents) {
                $varCounter++
                If($line -match $findtext) {
                    "File: $file Found: $line Line: $varCounter"
                 }
          }
    $document.close()
    }
    $application.quit()

    }

getStringMatch


来源:https://stackoverflow.com/questions/27169043/powershell-search-matching-string-in-word-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!