How to extract text from a PDF file?

前端 未结 24 2354
孤城傲影
孤城傲影 2020-11-22 14:05

I\'m trying to extract the text included in this PDF file using Python.

I\'m using the PyPDF2 module, and have the following script:

imp         


        
24条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-11-22 14:38

    Here is the simplest code for extracting text

    code:

    # importing required modules
    import PyPDF2
    
    # creating a pdf file object
    pdfFileObj = open('filename.pdf', 'rb')
    
    # creating a pdf reader object
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
    
    # printing number of pages in pdf file
    print(pdfReader.numPages)
    
    # creating a page object
    pageObj = pdfReader.getPage(5)
    
    # extracting text from page
    print(pageObj.extractText())
    
    # closing the pdf file object
    pdfFileObj.close()
    

提交回复
热议问题