Check whether a PDF-File is valid with Python

后端 未结 7 599
借酒劲吻你
借酒劲吻你 2020-12-08 10:50

I get a File via a HTTP-Upload and need to be sure its a pdf-file. Programing Language is Python, but this should not matter.

I thought of the follow

7条回答
  •  隐瞒了意图╮
    2020-12-08 11:18

    The two most commonly used PDF libraries for Python are:

    • pyPdf
    • ReportLab

    Both are pure python so should be easy to install as well be cross-platform.

    With pyPdf it would probably be as simple as doing:

    from pyPdf import PdfFileReader
    doc = PdfFileReader(file("upload.pdf", "rb"))
    

    This should be enough, but doc will now have documentInfo() and numPages() methods if you want to do further checking.

    As Carl answered, pdftotext is also a good solution, and would probably be faster on very large documents (especially ones with many cross-references). However it might be a little slower on small PDF's due to system overhead of forking a new process, etc.

提交回复
热议问题