pypdf | 易学教程

Pypdf extracts code from one PDF, but not from another?

阅读更多关于 Pypdf extracts code from one PDF, but not from another?

问题 I am trying to make a primitive crawler for my own pdf files. For that, I use Pypdf to extract the Data (Customer, Product, Amount, etc.) and use that data. Now, I have the code, its pretty easy, but it doesn't seem to be able to extract anything out of my PDFs while I tried it on some random PDF from google and it works. I tried with multiple of my documents, pdfs, don't work, random pdf off the internet works. I use Spyder. Below is the code I am using: import PyPDF2 as p2 PDFfile=open("pdf

How to insert a “missing” page as blank page in PDF with Python?

阅读更多关于 How to insert a “missing” page as blank page in PDF with Python?

问题 Say you have to join some pages that are number 2, 4 and 5… (the files are named test_002.pdf, test_004.pdf and test_005.pdf), then we could say there is a page 3 missing. What I try to do is having a result from those commands : pdfjam --nup 2 --papersize '{47cm,30cm}' --scale 1.0 test_002.pdf test_003.pdf --outfile joined_002-003.pdf pdfjam --nup 2 --papersize '{47cm,30cm}' --scale 1.0 test_004.pdf test_005.pdf --outfile joined_004-005.pdf that will join even and odd page in one unique page

Export Pandas DataFrame into a PDF file using Python

阅读更多关于 Export Pandas DataFrame into a PDF file using Python

问题 What is an efficient way to generate PDF for data frames in Pandas? 回答1: Well one way is to use markdown. You can use df.to_html() . This converts the dataframe into a html table. From there you can put the generated html into a markdown file (.md) (see http://daringfireball.net/projects/markdown/basics). From there, there are utilities to convert markdown into a pdf (https://www.npmjs.com/package/markdown-pdf). One all-in-one tool for this method is to use Atom text editor (https://atom.io/)

PDF - Remove White Margins

阅读更多关于 PDF - Remove White Margins

问题 I would like to know a way to remove white margins from a PDF file. Just like Adobe Acrobat X Pro does. I understand it will not work with every PDF file. I would guess that the way to do it, is by getting the text margins, then cropping out of that margins. PyPdf is preferred. iText finds text margins based on this code: public void addMarginRectangle(String src, String dest) throws IOException, DocumentException { PdfReader reader = new PdfReader(src); PdfReaderContentParser parser = new

How to get bookmark of PDF and add bookmark to new pdf?

阅读更多关于 How to get bookmark of PDF and add bookmark to new pdf?

问题 I am merging one PDf to Other to other PDF, it is working fine, but Bookmark is missing in final PDF. Following is PDF generation code: #- Create One Page PDF with some text from reportlab.pdfgen import canvas as canx c = canx.Canvas('transparent.pdf') c.setStrokeColor((1, 0, 0)) transparentwhite = canx.Color(255, 255, 255, alpha = 0.0) c.setFillColor(transparentwhite) t = c.beginText() t.setTextRenderMode(2) c._code.append(t.getCode()) c.setFont('Helvetica', 48) c.saveState() c.translate(100

Python formatWarning and cross-package errors

阅读更多关于 Python formatWarning and cross-package errors

问题 Okay, I am confused. I am using two Python packages - PyPDF2 and SQLAlchemy. SQLAlchemy is raising a warning using python's warning.warn(), and somehow calling a formatWarning() function in PyPDF2, which also uses python's warning.warn(). Is this an error in SQLAlchemy or PyPDF2? How does this even happen - is formatWarning some special function? PyPDF2 defines it as: #custom implementation of warnings.formatwarning def formatWarning(message, category, filename, lineno, line=None): file =

Python read part of a pdf page

阅读更多关于 Python read part of a pdf page

问题 I'm trying to read a pdf file where each page is divided into 3x3 blocks of information of the form A | B | C D | E | F G | H | I Each of the entries is broken into multiple lines. A simplified example of one entry is this card. But then there would be similar entries in the other 8 slots. I've looked at pdfminer and pypdf2. I haven't found pdfminer overly useful, but pypdf2 has given me something close. import PyPDF2 from StringIO import StringIO def getPDFContent(path): content = "" p =

Place a vertical or rotated text in a PDF with Python

阅读更多关于 Place a vertical or rotated text in a PDF with Python

问题 I'm currently generating a PDF with PyFPDF. I also need to add a vertical/rotated text. Unfortunately, it's not directly supported in PyPDF as far as I see. There are solutions for FPDF for PHP. Is there a way to insert vertical or rotated text in a PDF from Python, either with PyFPDF or with another library? 回答1: I believe you can do so with PyMuPDF. I've inserted text with the module before but not rotated text. There is a rotate parameter in the insertText method so hopefully it will work

Python: How to replace text in pdf

阅读更多关于 Python: How to replace text in pdf

问题 I have a pdf file and i want to replace some text in pdf file and generate new pdf. How can i do that in python? I have tried reportlab , reportlab does not have any fucntion to search text and replace it. What other module can i use? 回答1: You can try Aspose.PDF Cloud SDK for Python, Aspose.PDF Cloud is a REST API PDF Processing solution. It is paid API and its free package plan provides 50 credits per month. I'm developer evangelist at Aspose. import os import asposepdfcloud from

split a pdf based on outline

阅读更多关于 split a pdf based on outline

问题 i would like to use pyPdf to split a pdf file based on the outline where each destination in the outline refers to a different page within the pdf. example outline: main --> points to page 1 sect1 --> points to page 1 sect2 --> points to page 15 sect3 --> points to page 22 it is easy within pyPdf to iterate over each page of the document or each destination in the document's outline; however, i cannot figure out how to get the page number where the destination points. does anybody know how to