How to know if a PDF contains only images or has been OCR scanned for searching?

前端 未结 7 1985
借酒劲吻你
借酒劲吻你 2020-12-08 10:35

I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one

7条回答
  •  一整个雨季
    2020-12-08 11:03

    Sorry to dig up old thread, but if you found this have a look at my thread:

    Batch OCR Program for PDFs

    you can get extra information about the pdf by catting it in unix/linux/osx or opening it as "rb" mode in python. (course that's python and you didn't want to use that but maybe it has something equivalent).

提交回复
热议问题