Batch OCRing PDFs that haven't already been OCR'd
问题 If I have 10,000 PDFs, some of which have been OCRed, some of which have 1 page that has been OCRed but the rest of the pages have not, how can I go through all the PDFs and only OCR the pages that haven't already been done? 回答1: This is exactly what I was looking for, I have thousands of scanned PDF files, where some were already OCR'ed and some are not. So, I combined information I found on fora and Stack Overflow, and made my own solution that does EXACTLY that, which I have summarized for