Excel VBA to Search for Text in PDF and Extract and Name Pages

后端 未结 3 1439
情深已故
情深已故 2020-12-17 07:23

I have the following code, which looks at each cell in column A of my spreadsheet, searches for the text it finds there in the specified PDF and then extracts the page where

3条回答
  •  没有蜡笔的小新
    2020-12-17 07:39

    Sorry to post a quick, incomplete answer, but I think I can point you in a good direction.

    Instead of making the system look up the two terms hundreds of billions of times, then make hundreds of billions of comparisons, put your search terms into an array, and the text of each page into a long string.Then it only has to do one look up and 200 comparisons per page.

    'Dim your Clipboard functions
    Public Declare PtrSafe Function OpenClipboard Lib "user32" (ByVal hwnd As Long) As Long
    Public Declare PtrSafe Function EmptyClipboard Lib "user32" () As Long
    Public Declare PtrSafe Function CloseClipboard Lib "user32" () As Long
    
    '...
    
    Dim objData As New MSForms.DataObject
    Dim arrSearch() As String
    Dim strTxt As String
    
    '...
    
    'Create array of search terms
    For i = 2 To lastrow
        arrSearch(i - 2) = Sheets("Sheet1").Cells(1, i)
    Next i
    
    For page = 0 To objPDDoc.GetNumPages - 1
    
        '[Move each page into a new document. You already have that code]
    
        'Clear clipboard
        OpenClipboard (0&)
        EmptyClipboard
        CloseClipboard
    
        'Copy page to clipboard
        objApp.MenuItemExecute ("SelectAll")
        objApp.MenuItemExecute ("Copy")
        'You can also do this with the JavaScript object: objjso.ExecMenuItem("Item Name")
        'You may have to insert a waiting function like sleep() here to wait for the action to complete
    
        'Put data from clipboard into a string.
        objData.GetFromClipboard
        strTxt = objData.GetText 'Now you can search the entire content of the page at once, within memory
    
        'Compare each element of the array to the string
        For i = LBound(arrSearch) To UBound(arrSearch)
            If InStr(1, strTxt, arrSearch(i)) > 0 Then
                '[You found a match. Your code here]
            End If
        Next i
    
    Next page
    

    This is still cumbersome because you have to open each page in a new document. If there is a good way to determine which page you're on purely by text (such as the page number at the bottom of page a, followed immediately by the header at the top of page b) then you might look at copying the entire text of the document into one string, then using the clues from the text to decide which page to extract once you find a match. That would be a lot faster I believe.

提交回复
热议问题