pypdf | 易学教程

PyPDF's PdfFileReader() having problems reading file, file not callable

阅读更多关于 PyPDF's PdfFileReader() having problems reading file, file not callable

问题 So here is my import: from pyPdf import PdfFileWriter, PdfFileReader Here is were I write my pdf: filenamer = filename + '.pdf' pdf = PdfPages(filenamer) (great naming convention, I know!) I write some things to it. I close it here: pdf.close() Here is where I try and read it: input1 = PdfFileReader(file(filenamer, "rb")) And here is the error: Traceback (most recent call last): File "./datamine.py", line 405, in <module> input1 = PdfFileReader(file(filenamer, "rb")) TypeError: 'file' object

Unable to use pypdf module

阅读更多关于 Unable to use pypdf module

问题 I have installed the pyPdf module successfully using the command pip install pydf but when I use the module using the import command I get the following error: enC:\Anaconda3\lib\site-packages\pyPdf\__init__.py in <module>() 1 from pdf import PdfFileReader, PdfFileWriter 2 __all__ = ["pdf"] ImportError: No module named 'pdf' What should I do? I have installed the pdf module as well but still the error does not go away. 回答1: This is a problem of PyPDF, which does not occur in PyPDF2. Actually,

How do I apply my python code to all of the files in a folder at once, and how do I create a new name for each subsequent output file?

阅读更多关于 How do I apply my python code to all of the files in a folder at once, and how do I create a new name for each subsequent output file?

问题 The code I am working with takes in a .pdf file, and outputs a .txt file. My question is, how do I create a loop (probably a for loop) which runs the code over and over again on all files in a folder which end in ".pdf"? Furthermore, how do I change the output each time the loop runs so that I can write a new file each time, that has the same name as the input file (ie. 1_pet.pdf > 1_pet.txt, 2_pet.pdf > 2_pet.txt, etc.) Here is the code so far: path="2_pet.pdf" content = getPDFContent(path)

textract failed with exit code 127: pdftotext on windows 10

阅读更多关于 textract failed with exit code 127: pdftotext on windows 10

问题 I am trying to run a python program on a windows 10 machine with which I am trying to read and convert PDF files. However every time I run the program I get the following error. I have not found out how to resolve this yet. Is there anyone who can help me please :) Exception in Tkinter callback Traceback (most recent call last): File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\utils.py", line 82, in run pipe = subprocess.Popen( File "C:

Windows Error: 32 when trying to rename file in python

阅读更多关于 Windows Error: 32 when trying to rename file in python

问题 I'm trying to rename some PDF files using pyPdf and my code it seems to work fine until it reaches the rename sentence. The While/if block of code looks for the page number where string "This string" is located and when found stops. Having the page number the "new name" is created. My issue is that even when the with block it's supposed to close automatically the file, when it's reached the rename sentence I get the error below Traceback (most recent call last): File "<stdin>", line 14, in

PyPDF2 won't extract all text from PDF

阅读更多关于 PyPDF2 won't extract all text from PDF

问题 I'm trying to extract text from a PDF (https://www.sec.gov/litigation/admin/2015/34-76574.pdf) using PyPDF2, and the only result I'm getting is the following string: b'' Here is my code: import PyPDF2 import urllib.request import io url = 'https://www.sec.gov/litigation/admin/2015/34-76574.pdf' remote_file = urllib.request.urlopen(url).read() memory_file = io.BytesIO(remote_file) read_pdf = PyPDF2.PdfFileReader(memory_file) number_of_pages = read_pdf.getNumPages() page = read_pdf.getPage(1)