Basic Powershell - batch convert Word Docx to PDF

让人想犯罪 __ 提交于 2019-11-28 03:09:38

This will work for doc as well as docx files.

$documents_path = 'c:\doc2pdf'

$word_app = New-Object -ComObject Word.Application

# This filter will find .doc as well as .docx documents
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {

    $document = $word_app.Documents.Open($_.FullName)

    $pdf_filename = "$($_.DirectoryName)\$($_.BaseName).pdf"

    $document.SaveAs([ref] $pdf_filename, [ref] 17)

    $document.Close()
}

$word_app.Quit()

This works for me (Word 2007):

$wdFormatPDF = 17
$word = New-Object -ComObject Word.Application
$word.visible = $false

$folderpath = Split-Path -parent $MyInvocation.MyCommand.Path

Get-ChildItem -path $folderpath -recurse -include "*.doc" | % {
    $path =  ($_.fullname).substring(0,($_.FullName).lastindexOf("."))
    $doc = $word.documents.open($_.fullname)
    $doc.saveas($path, $wdFormatPDF) 
    $doc.close()
}

$word.Quit()

The above answers all fell short for me, as I was doing a batch job converting around 70,000 word documents this way. As it turns out, doing this repeatedly eventually leads to Word crashing, presumably due to memory issues (the error was some COMException that I didn't know how to parse). So, my hack to get it to proceed was to kill and restart word every 100 docs (arbitrarily chosen number).

Additionally, when it did crash occasionally, there would be resulting malformed pdfs, each of which were generally 1-2 kb in size. So, when skipping already generated pdfs, I make sure they are at least 3kb in size. If you don't want to skip already generated PDFs, you can delete that if statement.

Excuse me if my code doesn't look good, I don't generally use Windows and this was a one-off hack. So, here's the resulting code:

$Files=Get-ChildItem -path '.\path\to\docs' -recurse -include "*.doc*"

$counter = 0
$filesProcessed = 0
$Word = New-Object -ComObject Word.Application

Foreach ($File in $Files) {
    $Name="$(($File.FullName).substring(0, $File.FullName.lastIndexOf("."))).pdf"
    if ((Test-Path $Name) -And (Get-Item $Name).length -gt 3kb) {
        echo "skipping $($Name), already exists"
        continue
    }

    echo "$($filesProcessed): processing $($File.FullName)"
    $Doc = $Word.Documents.Open($File.FullName)
    $Doc.SaveAs($Name, 17)
    $Doc.Close()
    if ($counter -gt 100) {
        $counter = 0
        $Word.Quit()
        [System.Runtime.Interopservices.Marshal]::ReleaseComObject($Word)
        $Word = New-Object -ComObject Word.Application
    }
    $counter = $counter + 1
    $filesProcessed = $filesProcessed + 1
}

Neither of the solutions posted here worked for me on Windows 8.1 (btw. I'm using Office 365). My PowerShell somehow does not like the [ref] arguments (I don't know why, I use PowerShell very rarely).

This is the solution that worked for me:

$Files=Get-ChildItem 'C:\path\to\files\*.docx'

$Word = New-Object -ComObject Word.Application

Foreach ($File in $Files) {
    $Doc = $Word.Documents.Open($File.FullName)
    $Name=($Doc.FullName).replace('docx', 'pdf')
    $Doc.SaveAs($Name, 17)
    $Doc.Close()
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!