How to extract text from pdf in Python 3.7

后端 未结 10 1233
后悔当初
后悔当初 2020-12-29 10:19

I am trying to extract text from a PDF file using Python. My main goal is I am trying to create a program that reads a bank statement and extracts its text to update an exce

10条回答
  •  遥遥无期
    2020-12-29 11:07

    If you are looking for a maintained, bigger project, have a look at PyMuPDF. Install it with pip install pymupdf and use it like this:

    import fitz
    
    def get_text(filepath: str) -> str:
        with fitz.open(filepath) as doc:
            text = ""
            for page in doc:
                text += page.getText().strip()
            return text
    

提交回复
热议问题