How to know if a PDF contains only images or has been OCR scanned for searching?

前端 未结 7 2000
借酒劲吻你
借酒劲吻你 2020-12-08 10:35

I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one

7条回答
  •  清歌不尽
    2020-12-08 11:12

    Apago's pdfspy extracts information from PDF into an XML file. It includes information about the document including images and text. For your project, the useful information includes image count & size and where there is OCR (hidden) text.

    http://www.apagoinc.com/pdfspy

提交回复
热议问题