发表新帖

发表新帖

Check whether a PDF-File is valid with Python

后端未结

关注

 7  603

借酒劲吻你

I get a File via a HTTP-Upload and need to be sure its a pdf-file. Programing Language is Python, but this should not matter.

I thought of the follow

相关标签:

7条回答

春和景丽

2020-12-08 11:44

By valid do you mean that it can be displayed by a PDF viewer, or that the text can be extracted? They are two very different things.

If you just want to check that it really is a PDF file that has been uploaded then the pyPDF solution, or something similar, will work.

If, however, you want to check that the text can be extracted then you have found a whole world of pain! Using pdftotext would be a simple solution that would work in a majority of cases but it is by no means 100% successful. We have found many examples of PDFs that pdftotext cannot extract from but Java libraries such as iText and PDFBox can.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

热议问题