I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one
Use "dtsearch" to create an index for all the pdf files... then "view the log file" of the indexing process to check the list of pdf files that were not indexed.