Check whether a PDF-File is valid with Python

后端 未结 7 595
借酒劲吻你
借酒劲吻你 2020-12-08 10:50

I get a File via a HTTP-Upload and need to be sure its a pdf-file. Programing Language is Python, but this should not matter.

I thought of the follow

相关标签:
7条回答
  • 2020-12-08 11:44

    By valid do you mean that it can be displayed by a PDF viewer, or that the text can be extracted? They are two very different things.

    If you just want to check that it really is a PDF file that has been uploaded then the pyPDF solution, or something similar, will work.

    If, however, you want to check that the text can be extracted then you have found a whole world of pain! Using pdftotext would be a simple solution that would work in a majority of cases but it is by no means 100% successful. We have found many examples of PDFs that pdftotext cannot extract from but Java libraries such as iText and PDFBox can.

    0 讨论(0)
提交回复
热议问题