Check if a PDF file is a scanned one

后端 未结 6 883
傲寒
傲寒 2020-12-09 20:08

What is the best way to programmatically check if a PDF file is a totally scanned one? I do have iText and PDFBox at my disposal. I can check if a pdf file contains text or

6条回答
  •  失恋的感觉
    2020-12-09 21:01

    find ./ -name "*.pdf" -print0 | xargs -0 -I {} bash -c 'export file="{}"; if [ $(pdffonts "$file" 2> /dev/null | wc -l) -lt 3 ]; then echo "$file"; fi'
    

    Explanation: pdffonts file.pdf will show more than 2 lines if pdf contains text. Outputs filenames of all pdf files that don't contain text, so are scanned PDFs.

提交回复
热议问题