Excel VBA to Search for Text in PDF and Extract and Name Pages

后端未结

关注

 3  1434

情深已故 2020-12-17 07:23

I have the following code, which looks at each cell in column A of my spreadsheet, searches for the text it finds there in the specified PDF and then extracts the page where

3条回答

被撕碎了的回忆 (楼主)

2020-12-17 07:37

Loops are definitely excellent for some things, but can tie down processing with these higher queries. Recently, a colleague and I were doing a similar task (not pdf-related though), and we had much success with using a range.find method instead of a loop executing instr on each cell.

Some points of interest: -To mimic the “loop cells” functionality when using the .find method, we ended our range statement with .cells, as seen below:

activesheet.usedrange.cells.find( )

Where the desired string goes within the ( ).

-The return value: “A Range object that represents the first cell where that information is found.”

Once the .find method returns a range, a subsequent subroutine can extract the page number and document name.

-If you need to find the nth instance of an occurrence, “You can use the FindNext andFindPrevious methods to repeat the search.” (Microsoft)

Microsoft overview of range.find: https://msdn.microsoft.com/en-us/vba/excel-vba/articles/range-find-method-excel

So with this approach, the user can use a loop based on a count of cells in your list to execute the .find method for each string.

Downside is (I assume) that this must be done on text within the excel application; also, I’ve not tested it to determine if the string has to inhabit the cell by itself (I don’t think this is a concern).

‘===================

Another suggestion that might be beneficial is to first bulk-rip all text from the .pdf with as little looping as possible (direct actions at the document object level). Then your find/return approach can be applied to the bulk text.

I did a similar activity when creating study notes from a professor’s PowerPoints; I grabbed all the text into a .txt file, then returned every sentence containing the instance of a list of strings.

‘=====================

A few caveats: I admit that I have not executed parsing at the sheer size of your project, so my suggestions might not be advantageous in practice.

Also, I have not done much work parsing .pdf documents, as I try to opt for anything that is .txt/excel app first, and engage it instead.

Good luck in your endeavors; I hope I was able to at least provide food for thought!

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...