I\'m trying to extract the text included in this PDF file using Python.
I\'m using the PyPDF2 module, and have the following script:
imp
Multi - page pdf can be extracted as text at single stretch instead of giving individual page number as argument using below code
import PyPDF2
import collections
pdf_file = open('samples.pdf', 'rb')
read_pdf = PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
c = collections.Counter(range(number_of_pages))
for i in c:
page = read_pdf.getPage(i)
page_content = page.extractText()
print page_content.encode('utf-8')