How to extract text from a PDF file?

前端 未结 24 2346
孤城傲影
孤城傲影 2020-11-22 14:05

I\'m trying to extract the text included in this PDF file using Python.

I\'m using the PyPDF2 module, and have the following script:

imp         


        
24条回答
  •  感动是毒
    2020-11-22 14:35

    PyPDF2 does work, but results may vary. I am seeing quite inconsistent findings from its result extraction.

    reader=PyPDF2.pdf.PdfFileReader(self._path)
    eachPageText=[]
    for i in range(0,reader.getNumPages()):
        pageText=reader.getPage(i).extractText()
        print(pageText)
        eachPageText.append(pageText)
    

提交回复
热议问题