I am trying to extract text from a PDF file using Python. My main goal is I am trying to create a program that reads a bank statement and extracts its text to update an exce
If you are looking for a maintained, bigger project, have a look at PyMuPDF. Install it with pip install pymupdf and use it like this:
pip install pymupdf
import fitz def get_text(filepath: str) -> str: with fitz.open(filepath) as doc: text = "" for page in doc: text += page.getText().strip() return text