How can I get the total count of total pages of a pdf using pdfminer in python

后端 未结 4 607
旧巷少年郎
旧巷少年郎 2021-02-06 17:10

In PyPDF2 pdfreader.getNumPages() gives me the total number of pages of a pdf file.

How can I get this using pdfminer?

4条回答
  •  我寻月下人不归
    2021-02-06 17:37

    I hate to just leave a code snippet. For context here is a link to the current pdfminer.six repo where you might be able to learn a little more about the resolve1 method.

    As you're working with pdfminer you might print and come across some PDFObjRef objects. Essentially you can use resolve1 to expand those objects (they're usually a dictionary).

    from pdfminer.pdfparser import PDFParser
    from pdfminer.pdfdocument import PDFDocument
    from pdfminer.pdfpage import PDFPage
    from pdfminer.pdfinterp import resolve1
    
    file = open('some_file.pdf', 'rb')
    parser = PDFParser(file)
    document = PDFDocument(parser)
    
    # This will give you the count of pages
    print(resolve1(document.catalog['Pages'])['Count'])
    

提交回复
热议问题