I\'m trying to extract the text included in this PDF file using Python.
I\'m using the PyPDF2 module, and have the following script:
imp
I recommend to use pymupdf or pdfminer.six.
Those packages are not maintained:
pdfminer (without .six)There are different options which will give different results, but the most basic one is:
import fitz # this is pymupdf
with fitz.open("my.pdf") as doc:
text = ""
for page in doc:
text += page.getText()
print(text)