pypdf

PyPDF's PdfFileReader() having problems reading file, file not callable

我的未来我决定 提交于 2021-02-10 14:18:54
问题 So here is my import: from pyPdf import PdfFileWriter, PdfFileReader Here is were I write my pdf: filenamer = filename + '.pdf' pdf = PdfPages(filenamer) (great naming convention, I know!) I write some things to it. I close it here: pdf.close() Here is where I try and read it: input1 = PdfFileReader(file(filenamer, "rb")) And here is the error: Traceback (most recent call last): File "./datamine.py", line 405, in <module> input1 = PdfFileReader(file(filenamer, "rb")) TypeError: 'file' object

Unable to use pypdf module

故事扮演 提交于 2021-02-08 12:16:22
问题 I have installed the pyPdf module successfully using the command pip install pydf but when I use the module using the import command I get the following error: enC:\Anaconda3\lib\site-packages\pyPdf\__init__.py in <module>() 1 from pdf import PdfFileReader, PdfFileWriter 2 __all__ = ["pdf"] ImportError: No module named 'pdf' What should I do? I have installed the pdf module as well but still the error does not go away. 回答1: This is a problem of PyPDF, which does not occur in PyPDF2. Actually,

How do I apply my python code to all of the files in a folder at once, and how do I create a new name for each subsequent output file?

柔情痞子 提交于 2021-02-05 06:20:06
问题 The code I am working with takes in a .pdf file, and outputs a .txt file. My question is, how do I create a loop (probably a for loop) which runs the code over and over again on all files in a folder which end in ".pdf"? Furthermore, how do I change the output each time the loop runs so that I can write a new file each time, that has the same name as the input file (ie. 1_pet.pdf > 1_pet.txt, 2_pet.pdf > 2_pet.txt, etc.) Here is the code so far: path="2_pet.pdf" content = getPDFContent(path)

textract failed with exit code 127: pdftotext on windows 10

杀马特。学长 韩版系。学妹 提交于 2021-01-29 18:21:56
问题 I am trying to run a python program on a windows 10 machine with which I am trying to read and convert PDF files. However every time I run the program I get the following error. I have not found out how to resolve this yet. Is there anyone who can help me please :) Exception in Tkinter callback Traceback (most recent call last): File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\utils.py", line 82, in run pipe = subprocess.Popen( File "C:

Windows Error: 32 when trying to rename file in python

人盡茶涼 提交于 2021-01-29 07:00:54
问题 I'm trying to rename some PDF files using pyPdf and my code it seems to work fine until it reaches the rename sentence. The While/if block of code looks for the page number where string "This string" is located and when found stops. Having the page number the "new name" is created. My issue is that even when the with block it's supposed to close automatically the file, when it's reached the rename sentence I get the error below Traceback (most recent call last): File "<stdin>", line 14, in

PyPDF2 won't extract all text from PDF

别来无恙 提交于 2020-12-01 11:46:29
问题 I'm trying to extract text from a PDF (https://www.sec.gov/litigation/admin/2015/34-76574.pdf) using PyPDF2, and the only result I'm getting is the following string: b'' Here is my code: import PyPDF2 import urllib.request import io url = 'https://www.sec.gov/litigation/admin/2015/34-76574.pdf' remote_file = urllib.request.urlopen(url).read() memory_file = io.BytesIO(remote_file) read_pdf = PyPDF2.PdfFileReader(memory_file) number_of_pages = read_pdf.getNumPages() page = read_pdf.getPage(1)

PyPDF2 won't extract all text from PDF

让人想犯罪 __ 提交于 2020-12-01 11:46:27
问题 I'm trying to extract text from a PDF (https://www.sec.gov/litigation/admin/2015/34-76574.pdf) using PyPDF2, and the only result I'm getting is the following string: b'' Here is my code: import PyPDF2 import urllib.request import io url = 'https://www.sec.gov/litigation/admin/2015/34-76574.pdf' remote_file = urllib.request.urlopen(url).read() memory_file = io.BytesIO(remote_file) read_pdf = PyPDF2.PdfFileReader(memory_file) number_of_pages = read_pdf.getNumPages() page = read_pdf.getPage(1)