PyPDF 2 Decrypt Not Working

前端 未结 7 1244
悲哀的现实
悲哀的现实 2020-12-15 20:38

Currently I am using the PyPDF 2 as a dependency.

I have encountered some encrypted files and handled them as you normally would (in the following code):

<         


        
相关标签:
7条回答
  • 2020-12-15 20:52

    You can try PyMuPDF package, it can open encrypted files and solved my problems.

    Reference: PyMuPDF Documentation

    0 讨论(0)
  • 2020-12-15 21:01

    The following code could solve this problem:

    import os
    import PyPDF2
    from PyPDF2 import PdfFileReader
    
    fp = open(filename)
    pdfFile = PdfFileReader(fp)
    if pdfFile.isEncrypted:
        try:
            pdfFile.decrypt('')
            print('File Decrypted (PyPDF2)')
        except:
            command = ("cp "+ filename +
                " temp.pdf; qpdf --password='' --decrypt temp.pdf " + filename
                + "; rm temp.pdf")
            os.system(command)
            print('File Decrypted (qpdf)')
            fp = open(filename)
            pdfFile = PdfFileReader(fp)
    else:
        print('File Not Encrypted')
    
    0 讨论(0)
  • 2020-12-15 21:02

    Thanks @Zijian He, your solution is worked for me. Solution is, edit pdf.py file of pypdf2 package

    def getNumPages(self,password =''):
        """
        Calculates the number of pages in this PDF file.
    
        :return: number of pages
        :rtype: int
        :raises PdfReadError: if file is encrypted and restrictions prevent
            this action.
        """
    
        # Flattened pages will not work on an Encrypted PDF;
        # the PDF file's page count is used in this case. Otherwise,
        # the original method (flattened page count) is used.
        if self.isEncrypted:
            try:
                self._override_encryption = True
                self.decrypt(password)
                return self.trailer["/Root"]["/Pages"]["/Count"]
            except:
                raise utils.PdfReadError("File has not been decrypted")
            finally:
                self._override_encryption = False
        else:
            if self.flattenedPages == None:
                self._flatten()
            return len(self.flattenedPages)
    
    numPages = property(lambda self: self.getNumPages(), None, None)
    
    0 讨论(0)
  • 2020-12-15 21:03

    To Answer My Own Question: If you have ANY spaces in your file name, then PyPDF 2 decrypt function will ultimately fail despite returning a success code. Try to stick to underscores when naming your PDFs before you run them through PyPDF2.

    For example,

    Rather than "FDJKL492019 21490 ,LFS.pdf" do something like "FDJKL492019_21490_,LFS.pdf".

    0 讨论(0)
  • 2020-12-15 21:06

    It has nothing to do with whether the file has been decrypted or not when using the method getNumPages().

    If we take a look at the source code of getNumPages():

    def getNumPages(self):
        """
        Calculates the number of pages in this PDF file.
    
        :return: number of pages
        :rtype: int
        :raises PdfReadError: if file is encrypted and restrictions prevent
            this action.
        """
    
        # Flattened pages will not work on an Encrypted PDF;
        # the PDF file's page count is used in this case. Otherwise,
        # the original method (flattened page count) is used.
        if self.isEncrypted:
            try:
                self._override_encryption = True
                self.decrypt('')
                return self.trailer["/Root"]["/Pages"]["/Count"]
            except:
                raise utils.PdfReadError("File has not been decrypted")
            finally:
                self._override_encryption = False
        else:
            if self.flattenedPages == None:
                self._flatten()
            return len(self.flattenedPages)
    

    we will notice that it is the self.isEncrypted property controlling the flow. And as we all know the isEncrypted property is read-only and not changeable even when the pdf is decrypted.

    So, the easy way to handle the situation is just add the password as key-word argument with empty string as default value and pass your password when using the getNumPages() method and any other method build beyond it

    0 讨论(0)
  • 2020-12-15 21:09

    This error may come about due to 128-bit AES encryption on the pdf, see Query - is there a way to bypass security restrictions on a pdf?

    One workaround is to decrypt all isEncrypted pdfs with "qpdf"

    qpdf --password='' --decrypt input.pdf output.pdf
    

    Even if your PDF does not appear password protected, it may still be encrypted with no password. The above snippet assumes this is the case.

    0 讨论(0)
提交回复
热议问题