How to extract text from a PDF file?

前端 未结 24 2366
孤城傲影
孤城傲影 2020-11-22 14:05

I\'m trying to extract the text included in this PDF file using Python.

I\'m using the PyPDF2 module, and have the following script:

imp         


        
24条回答
  •  青春惊慌失措
    2020-11-22 14:34

    Multi - page pdf can be extracted as text at single stretch instead of giving individual page number as argument using below code

    import PyPDF2
    import collections
    pdf_file = open('samples.pdf', 'rb')
    read_pdf = PyPDF2.PdfFileReader(pdf_file)
    number_of_pages = read_pdf.getNumPages()
    c = collections.Counter(range(number_of_pages))
    for i in c:
       page = read_pdf.getPage(i)
       page_content = page.extractText()
       print page_content.encode('utf-8')
    

提交回复
热议问题