How to know if a PDF contains only images or has been OCR scanned for searching?

前端未结

关注

 7  1985

借酒劲吻你 2020-12-08 10:35

I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one

7条回答

一整个雨季 (楼主)

2020-12-08 11:03

Sorry to dig up old thread, but if you found this have a look at my thread:

Batch OCR Program for PDFs

you can get extra information about the pdf by catting it in unix/linux/osx or opening it as "rb" mode in python. (course that's python and you didn't want to use that but maybe it has something equivalent).

0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...