How can i read pdf in python? [duplicate]

对着背影说爱祢 提交于 2019-12-20 10:34:51

问题


How can i read pdf in python? I know one way of converting it to text, but i want to read the content directly from pdf.

Can anyone explain which module in python is best for pdf extraction


回答1:


You can USE PyPDF2 package

#install pyDF2
pip install PyPDF2

# importing all the required modules
import PyPDF2

# creating an object 
file = open('example.pdf', 'rb')

# creating a pdf reader object
fileReader = PyPDF2.PdfFileReader(file)

# print the number of pages in pdf file
print(fileReader.numPages)

Follow this Documentation http://pythonhosted.org/PyPDF2/




回答2:


You can use textract module in python

Textract

for install

pip install textract

for read pdf

import textract
text = textract.process('path/to/pdf/file', method='pdfminer')

For detail Textract




回答3:


Try PyPDF2.

There is a good tutorial here: https://automatetheboringstuff.com/chapter13/



来源:https://stackoverflow.com/questions/45795089/how-can-i-read-pdf-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!