“Not implemented” Exception when using pywin32 to control Adobe Acrobat

前端 未结 2 1562
陌清茗
陌清茗 2020-12-30 08:26

I have written a script in python using pywin32 to save pdf files to text that up until recently was working fine. I use similar methods in Excel. The code is below:

2条回答
  •  旧巷少年郎
    2020-12-30 09:00

    Blish, this thread holds the key to the solution you are looking for: https://mail.python.org/pipermail/python-win32/2002-March/000260.html

    I admit that the post above is not the easiest to find (probably because Google scores it low based on the age of the content?).

    Specifically, applying this piece of advice will get things running for you: https://mail.python.org/pipermail/python-win32/2002-March/000265.html

    For reference, the complete piece of code that does not require you to manually patch dynamic.py (snippet should run pretty much out of the box):

    # gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION
    from win32com.client import Dispatch
    from win32com.client.dynamic import ERRORS_BAD_CONTEXT
    
    import winerror
    
    # try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk
    try:
        from scandir import walk
    except ImportError:
        from os import walk
    
    import fnmatch
    
    import sys
    import os
    
    ROOT_INPUT_PATH = None
    ROOT_OUTPUT_PATH = None
    INPUT_FILE_EXTENSION = "*.pdf"
    OUTPUT_FILE_EXTENSION = ".txt"
    
    def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext):
        avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat
    
        # Open the input file (as a pdf)
        ret = avDoc.Open(f_path, f_path)
        assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise?
    
        pdDoc = avDoc.GetPDDoc()
    
        dst = os.path.join(f_path_out, ''.join((f_basename, f_ext)))
    
        # Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference"
        jsObject = pdDoc.GetJSObject()
    
        # Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml"
        jsObject.SaveAs(dst, "com.adobe.acrobat.accesstext")
    
        pdDoc.Close()
        avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs)
        del pdDoc
    
    if __name__ == "__main__":
        assert(5 == len(sys.argv)), sys.argv # 
    
                                     
                  
提交回复
热议问题