Excel VBA to Search for Text in PDF and Extract and Name Pages

后端 未结 3 1434
情深已故
情深已故 2020-12-17 07:23

I have the following code, which looks at each cell in column A of my spreadsheet, searches for the text it finds there in the specified PDF and then extracts the page where

3条回答
  •  被撕碎了的回忆
    2020-12-17 07:37

    Loops are definitely excellent for some things, but can tie down processing with these higher queries. Recently, a colleague and I were doing a similar task (not pdf-related though), and we had much success with using a range.find method instead of a loop executing instr on each cell.

    Some points of interest: -To mimic the “loop cells” functionality when using the .find method, we ended our range statement with .cells, as seen below:

    activesheet.usedrange.cells.find( )

    Where the desired string goes within the ( ).

    -The return value: “A Range object that represents the first cell where that information is found.”

    Once the .find method returns a range, a subsequent subroutine can extract the page number and document name.

    -If you need to find the nth instance of an occurrence, “You can use the FindNext andFindPrevious methods to repeat the search.” (Microsoft)

    Microsoft overview of range.find: https://msdn.microsoft.com/en-us/vba/excel-vba/articles/range-find-method-excel

    So with this approach, the user can use a loop based on a count of cells in your list to execute the .find method for each string.

    Downside is (I assume) that this must be done on text within the excel application; also, I’ve not tested it to determine if the string has to inhabit the cell by itself (I don’t think this is a concern).

    ‘===================

    Another suggestion that might be beneficial is to first bulk-rip all text from the .pdf with as little looping as possible (direct actions at the document object level). Then your find/return approach can be applied to the bulk text.

    I did a similar activity when creating study notes from a professor’s PowerPoints; I grabbed all the text into a .txt file, then returned every sentence containing the instance of a list of strings.

    ‘=====================

    A few caveats: I admit that I have not executed parsing at the sheer size of your project, so my suggestions might not be advantageous in practice.

    Also, I have not done much work parsing .pdf documents, as I try to opt for anything that is .txt/excel app first, and engage it instead.

    Good luck in your endeavors; I hope I was able to at least provide food for thought!

提交回复
热议问题