Loop over PDF files and transform them into doc with word

本小妞迷上赌 提交于 2020-01-03 08:40:08

问题


I am trying to use VBA coding - which I am pretty new to - to obtain a series of .doc documents from PDFs (which are not images), that is, I am trying to loop over various PDF files and save them in MS Word format. My experience is that word reads pretty well the PDF documents that I have: word maintains the correct layout of the PDF file most of the time. I am not sure if this is the right choice to tackle this and I ask for an alternative suggestion -- using R, if possible.

Anyway, here it is the code which I found here:

Sub convertToWord()

   Dim MyObj As Object, MySource As Object, file As Variant

   file = Dir("C:\Users\username\work_dir_example" & "*.pdf") 'pdf path

   Do While (file <> "")

   ChangeFileOpenDirectory "C:\Users\username\work_dir_example"

          Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""

    ChangeFileOpenDirectory "C:\Users\username\work_dir_example"

    ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
        , LockComments:=False, Password:="", AddToRecentFiles:=True, _
        WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
         SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
        False, CompatibilityMode:=15

    ActiveDocument.Close

     file = Dir

   Loop

End Sub

After pasting it in the developer's window, I save the code in a module -> I close the developer's window -> I click on the "Macros" button -> I execute the "convertToWord" macro. I get the following error in a pop up box: "Sub or Function not defined". How do I fix this? Also, previously, for some reason that is not clear to me now, I got an error related to the function ChangeFileOpenDirectory, which seemed not to be defined also.

Update 27/08/2017

I changed the code to the following:

Sub convertToWord()

   Dim MyObj As Object, MySource As Object, file As Variant

   file = Dir("C:\Users\username\work_dir_example" & "*.pdf")

   ChDir "C:\Users\username\work_dir_example"

   Do While (file <> "")

        Documents.Open Filename:=file, ConfirmConversions:=False, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""

        ActiveDocument.SaveAs2 Filename:=Replace(file, ".pdf", ".docx"), FileFormat:=wdFormatXMLDocument _
        , LockComments:=False, Password:="", AddToRecentFiles:=True, _
        WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
         SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
        False, CompatibilityMode:=15

    ActiveDocument.Close

     file = Dir

   Loop

End Sub

Now I do not get any error messages in a pop up box, but there is no output in my working directory. What might be wrong with it right now?


回答1:


Any language that can read PDF files and write Word docs (which are XML) can do this, but the conversion you like (which Word does when the PDF is opened) will require using an API for the application itself. VBA is your easy option.

The snippets you've posted (and my samples below) use early binding and enumerated constants, which means we need a reference to the Word object library. That is already set up for any code you write in a Word document, so create a new Word document and add the code in a standard module. (See this Excel tutorial if you need more details, the steps for our process are the same).

You can run your macro from the VB Editor (using the Run button) or from the normal document window (click the Macros button on the View tab in Word 2010-2016). Save your document as a DOCM file if you want to reuse the macro without setting up the code again.

Now for the code!

As stated in comments, your second snippet is valid if you just ensure that your folder paths end with a backslash "\" character. It's still not great code after you fix that, but that'll get you up and running.

I'll assume you want to go the extra mile and have a well-written version of this you could repurpose or expand upon later. For simplicity, we'll use two procedures: the main conversion and a procedure to suppress the PDF conversion warning dialog (controlled by the registry).

Main procedure:

Sub ConvertPDFsToWord2()
    Dim path As String
    'Manually edit path in the next line before running
    path = "C:\users\username\work_dir_example\"

    Dim file As String
    Dim doc As Word.Document
    Dim regValPDF As Integer
    Dim originalAlertLevel As WdAlertLevel

'Generate string for getting all PDFs with Dir command
    'Check for terminal \
    If Right(path, 1) <> "\" Then path = path & "\"
    'Append file type with wildcard
    file = path & "*.pdf"

    'Get path for first PDF (blank string if no PDFs exist)
    file = Dir(file)

    originalAlertLevel = Application.DisplayAlerts
    Application.DisplayAlerts = wdAlertsNone

    If file <> "" Then regValPDF = TogglePDFWarning(1)

    Do While file <> ""
        'Open method will automatically convert PDF for editing
        Set doc = Documents.Open(path & file, False)

        'Save and close document
        doc.SaveAs2 path & Replace(file, ".pdf", ".docx"), _
                    fileformat:=wdFormatDocumentDefault
        doc.Close False

        'Get path for next PDF (blank string if no PDFs remain)
        file = Dir
    Loop

CleanUp:
    On Error Resume Next 'Ignore errors during cleanup
    doc.Close False
    'Restore registry value, if necessary
    If regValPDF <> 1 Then TogglePDFWarning regValPDF
    Application.DisplayAlerts = originalAlertLevel

End Sub

Registry setting function:

Private Function TogglePDFWarning(newVal As Integer) As Integer
'This function reads and writes the registry value that controls
'the dialog displayed when Word opens (and converts) a PDF file
    Dim wShell As Object
    Dim regKey As String
    Dim regVal As Variant

    'setup shell object and string for key
    Set wShell = CreateObject("WScript.Shell")
    regKey = "HKCU\SOFTWARE\Microsoft\Office\" & _
             Application.Version & "\Word\Options\"

    'Get existing registry value, if any
    On Error Resume Next 'Ignore error if reg value does not exist
    regVal = wShell.RegRead(regKey & "DisableConvertPdfWarning")
    On Error GoTo 0      'Break on errors after this point

    wShell.regwrite regKey & "DisableConvertPdfWarning", newVal, "REG_DWORD"

    'Return original setting / registry value (0 if omitted)
    If Err.Number <> 0 Or regVal = 0 Then
        TogglePDFWarning = 0
    Else
        TogglePDFWarning = 1
    End If

End Function



回答2:


As others have stated, the problem seems to lie mostly with the path & file name. Here is the second version of the code you posted with some changes.

Unfortunately, a warning message pops up and setting DisplayAlerts to false will not suppress it. But if you click the "don't show this message again" checkbox the first time it pops up, then it will not continue to pop up for every file.

Sub convertToWord()

    Dim MyObj       As Object
    Dim MySource    As Object
    Dim file        As String
    Dim path        As String

    path = "C:\Users\username\work_dir_example\"
    file = Dir(path & "*.pdf")

    Do While (file <> "")
        Documents.Open FileName:=path & file
        With ActiveDocument
            .SaveAs2 FileName:=Replace(path & file, ".pdf", ".docx"), _
                                FileFormat:=wdFormatXMLDocument
            .Close
        End With
        file = Dir
    Loop

End Sub


来源:https://stackoverflow.com/questions/45890170/loop-over-pdf-files-and-transform-them-into-doc-with-word

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!