how do i change hyperlinks inside pdf using python?

五迷三道 提交于 2020-01-02 07:03:35

问题


How do I change the hyperlinks in pdf using python? I am currently using a pyPDF2 to open up and loop through the pages. How do I actually scan for hyperlinks and then proceed to change the hyperlinks?


回答1:


So I couldn't get what you want using the pyPDF2 library.

I did however get something working with another library: pdfrw. This installed fine for me using pip in Python 3.6:

pip install pdfrw

Note: for the following I have been using this example pdf I found online which contains multiple links. Your mileage may vary with this.

import pdfrw

pdf = pdfrw.PdfReader("pdf.pdf") #Load the pdf
new_pdf = pdfrw.PdfWriter() #Create an empty pdf

for page in pdf.pages: #Go through the pages
    for annot in page.Annots or []: #Links are in Annots, but some pages
                                    #don't have links so Annots returns None
        old_url = annot.A.URI

        #>Here you put logic for replacing the URLs<

        #Use the PdfString object to do the encoding for us.
        # Note the brackets around the URL here.
        new_url = pdfrw.objects.pdfstring.PdfString("(http://www.google.com)")

        #Override the URL with ours.
        annot.A.URI = new_url

    new_pdf.addpage(page)    

new_pdf.write("new.pdf")


来源:https://stackoverflow.com/questions/45191215/how-do-i-change-hyperlinks-inside-pdf-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!