Using the snippet below, I\'ve attempted to extract the text data from this PDF file.
import pyPdf def get_text(path): # Load PDF into pyPDF pdf = p
As an alternative to PyPDF2, I suggest pdftotext:
pdftotext
#!/usr/bin/env python """Use pdftotext to extract text from PDFs.""" import pdftotext with open("foobar.pdf") as f: pdf = pdftotext.PDF(f) # Iterate over all the pages for page in pdf: print(page)