Excel VBA to Search for Text in PDF and Extract and Name Pages

后端未结

关注

 3  1439

情深已故 2020-12-17 07:23

I have the following code, which looks at each cell in column A of my spreadsheet, searches for the text it finds there in the specified PDF and then extracts the page where

3条回答

没有蜡笔的小新 (楼主)

2020-12-17 07:39

Sorry to post a quick, incomplete answer, but I think I can point you in a good direction.

Instead of making the system look up the two terms hundreds of billions of times, then make hundreds of billions of comparisons, put your search terms into an array, and the text of each page into a long string.Then it only has to do one look up and 200 comparisons per page.

'Dim your Clipboard functions
Public Declare PtrSafe Function OpenClipboard Lib "user32" (ByVal hwnd As Long) As Long
Public Declare PtrSafe Function EmptyClipboard Lib "user32" () As Long
Public Declare PtrSafe Function CloseClipboard Lib "user32" () As Long

'...

Dim objData As New MSForms.DataObject
Dim arrSearch() As String
Dim strTxt As String

'...

'Create array of search terms
For i = 2 To lastrow
    arrSearch(i - 2) = Sheets("Sheet1").Cells(1, i)
Next i

For page = 0 To objPDDoc.GetNumPages - 1

    '[Move each page into a new document. You already have that code]

    'Clear clipboard
    OpenClipboard (0&)
    EmptyClipboard
    CloseClipboard

    'Copy page to clipboard
    objApp.MenuItemExecute ("SelectAll")
    objApp.MenuItemExecute ("Copy")
    'You can also do this with the JavaScript object: objjso.ExecMenuItem("Item Name")
    'You may have to insert a waiting function like sleep() here to wait for the action to complete

    'Put data from clipboard into a string.
    objData.GetFromClipboard
    strTxt = objData.GetText 'Now you can search the entire content of the page at once, within memory

    'Compare each element of the array to the string
    For i = LBound(arrSearch) To UBound(arrSearch)
        If InStr(1, strTxt, arrSearch(i)) > 0 Then
            '[You found a match. Your code here]
        End If
    Next i

Next page

This is still cumbersome because you have to open each page in a new document. If there is a good way to determine which page you're on purely by text (such as the page number at the bottom of page a, followed immediately by the header at the top of page b) then you might look at copying the entire text of the document into one string, then using the clues from the text to decide which page to extract once you find a match. That would be a lot faster I believe.

0 讨论(0)

查看其它3个回答