Add text to existing PDF document in Python

问题

~~I'm trying to convert a pdf to the same size as my pdf which is an A4 page.~~

convert my_pdf.pdf -density 300x300 -page A4 my_png.png

The resulting png file, however, is 595px × 842px which should be the resolution at 72 dpi. I was thinking of using PIL to write some text on some of the pdf fields and convert it back to PDF. But currently the image is coming out wrong.

Edit: I was approaching the problem from the wrong angle. The correct approach didn't include imagemagick at all.

回答1:

You should look at Add text to Existing PDF using Python and also Python as PDF Editing and Processing Framework. These will point you in the right direction.

If you do what you've proposed in the question, when you export back to .pdf, it will really just be an image file embedded in a .pdf, it won't be text.

回答2:

After searching around some I finally found the solution: It turns out that this was the correct approach after all. Yet, i feel that it wasn't verbose enough. It appears that the poster probably took it from here (same variable names etc).

The idea: create new blank PDF with Reportlab which only contains a text string. Then merge/add it as a watermark using pyPdf.

from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(100,100, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("mypdf.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("/home/joe/newpdf.pdf", "wb")
output.write(outputStream)
outputStream.close()

Hope this helps somebody else.

回答3:

I just tried the solution above, but I had quite some troubles to get it running in Python3. So, I would like to share my modifications. The adapted code looks as follows:

from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = io.BytesIO()

# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(100, 100, "Hello world")
can.save()

# move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("mypdf.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page2 = new_pdf.getPage(0)
page.mergePage(page2)
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("newpdf.pdf", "wb")
output.write(outputStream)
outputStream.close()

Now the page.mergePage throws an error. Turns out to be a porting error in pypdf2. Please refer to this question for the solution: Porting to Python3: PyPDF2 mergePage() gives TypeError

回答4:

pdfrw will let you take existing PDFs and place them as form XObjects (similar to images) on a reportlab canvas. There are some examples for this in the pdfrw examples/rl1 subdirectory on github. Disclaimer -- I am the pdfrw author.

来源：https://stackoverflow.com/questions/6819336/add-text-to-existing-pdf-document-in-python

标签

python

pdf-generation

ImageMagick