How to extract text from a PDF file?

前端 未结 24 2253
孤城傲影
孤城傲影 2020-11-22 14:05

I\'m trying to extract the text included in this PDF file using Python.

I\'m using the PyPDF2 module, and have the following script:

imp         


        
24条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-11-22 14:39

    Use textract.

    • http://textract.readthedocs.io/en/latest/
    • https://github.com/deanmalmgren/textract

    It supports many types of files including PDFs

    import textract
    text = textract.process("path/to/file.extension")
    

提交回复
热议问题