Does a library exist to remove passwords from PDFs programmatically? [closed]

问题

Does a library exist that will remove "owner" passwords from PDF documents so that the text can then be programmatically extracted from them? Something like PDF Technologies' Password Recovery tool, but callable from the command line or from Python. A GUI interface is not really useful to me, since the number of documents is so large.

Please, no comments on the legality of the process. The PDFs in question are owned, and the text needs to be extracted in order to form keyword clouds for the document set.

回答1:

I do not know about python libraries, but for batch removal of passwords from PDF documents, my colleagues have had good experience with PwdRemover (not free).

回答2:

Here are two other (open-source) tools for command-line processing:

QPDF: A Content-Preserving PDF Transformation System:

qpdf --password=PASSWORD --decrypt SECURED.pdf UNSECURED.pdf

pdftk - the pdf toolkit:

pdftk SECURED.pdf input_pw PASSWORD output UNSECURED.pdf

回答3:

If you've forgotten the password or the employee who encrypted the documents has since left the company, you can use PDFCrack to recover the password(s).

来源：https://stackoverflow.com/questions/1750716/does-a-library-exist-to-remove-passwords-from-pdfs-programmatically

标签

python

pdf

pdf-generation

passwords