How to extract text from pdf in Python 3.7

后端未结

关注

 10  1233

后悔当初 2020-12-29 10:19

I am trying to extract text from a PDF file using Python. My main goal is I am trying to create a program that reads a bank statement and extracts its text to update an exce

10条回答

遥遥无期 (楼主)

2020-12-29 11:07
If you are looking for a maintained, bigger project, have a look at PyMuPDF. Install it with pip install pymupdf and use it like this:
```
import fitz

def get_text(filepath: str) -> str:
    with fitz.open(filepath) as doc:
        text = ""
        for page in doc:
            text += page.getText().strip()
        return text
```
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...