Does a library exist to remove passwords from PDFs programmatically? [closed]

泪湿孤枕 提交于 2019-12-10 12:48:53

问题


Does a library exist that will remove "owner" passwords from PDF documents so that the text can then be programmatically extracted from them? Something like PDF Technologies' Password Recovery tool, but callable from the command line or from Python. A GUI interface is not really useful to me, since the number of documents is so large.

Please, no comments on the legality of the process. The PDFs in question are owned, and the text needs to be extracted in order to form keyword clouds for the document set.


回答1:


I do not know about python libraries, but for batch removal of passwords from PDF documents, my colleagues have had good experience with PwdRemover (not free).




回答2:


Here are two other (open-source) tools for command-line processing:

QPDF: A Content-Preserving PDF Transformation System:

qpdf --password=PASSWORD --decrypt SECURED.pdf UNSECURED.pdf

pdftk - the pdf toolkit:

pdftk SECURED.pdf input_pw PASSWORD output UNSECURED.pdf



回答3:


If you've forgotten the password or the employee who encrypted the documents has since left the company, you can use PDFCrack to recover the password(s).



来源:https://stackoverflow.com/questions/1750716/does-a-library-exist-to-remove-passwords-from-pdfs-programmatically

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!