How to read line by line in pdf file using PyPdf?

后端 未结 3 1284
闹比i
闹比i 2020-12-05 03:04

I have some code to read from a pdf file. Is there a way to read line by line from the pdf file (not pages) using Pypdf, Python 2.6, on Windows?

Here is the code for

3条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-05 03:12

    import pyPdf  
    def getPDFContent(path):
        content = ""
        num_pages = 10
        p = file(path, "rb")
        pdf = pyPdf.PdfFileReader(p)
        for i in range(0, num_pages):
            content += pdf.getPage(i).extractText() + "\n"
        content = " ".join(content.replace(u"\xa0", " ").strip().split())     
        return content 
    

提交回复
热议问题