If I have 10,000 PDFs, some of which have been OCRed, some of which have 1 page that has been OCRed but the rest of the pages have not, how can I go through all the PDFs and
Unburying this thread.
You can know which PDF files have already been OCRed by testing them with pdffonts. If there are embedded fonts, it's very probable that the PDF is already OCRed.
As for the batch processing, I wrote a little script that can batch OCR to pdf/word/excel/csv output format.
You may find it at https://github.com/deajan/pmOCR pmOCR (poor man's OCR is a wrapper for Abbyy OCR CLI for linux or Tesseract 3 open source solution).