Editing Hyperlink and Anchors in PDF using ITextSharp

后端 未结 3 1140
感动是毒
感动是毒 2020-11-30 14:17

I am using iTextSharp library and C#.Net for splitting my PDF file.

Consider a PDF file named sample.pdf containing 72 pages. This sample.pdf contains pages that hav

3条回答
  •  -上瘾入骨i
    2020-11-30 15:00

    Alright, based on what @Mark Storer here's some starter code. The first method creates a sample PDF with 10 pages and some links on the first page that jump around to different parts of the PDF so we have something to work with. The second methods opens the PDF created in the first method and walks through each annotation trying to figure out which page the annotation links to and outputs it to the TRACE window. The code is in VB but should be easily converted to C# if needed. Its targetting iTextSharp 5.1.1.0.

    If I get a chance I might try to take this further and actually split and re-link things but I don't have time right now.

    Option Explicit On
    Option Strict On
    
    Imports iTextSharp.text
    Imports iTextSharp.text.pdf
    Imports System.IO
    
    Public Class Form1
        ''//Folder that we are working in
        Private Shared ReadOnly WorkingFolder As String = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs")
        ''//Sample PDF
        Private Shared ReadOnly BaseFile As String = Path.Combine(WorkingFolder, "Sample.pdf")
    
        Private Shared Sub CreateSamplePdf()
            ''//Create our output directory if it does not exist
            Directory.CreateDirectory(WorkingFolder)
    
            ''//Create our sample PDF
            Using Doc As New iTextSharp.text.Document(PageSize.LETTER)
                Using FS As New FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read)
                    Using writer = PdfWriter.GetInstance(Doc, FS)
                        Doc.Open()
    
                        ''//Turn our hyperlinks blue
                        Dim BlueFont As Font = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE)
    
                        ''//Create 10 pages with simple labels on them
                        For I = 1 To 10
                            Doc.NewPage()
                            Doc.Add(New Paragraph(String.Format("Page {0}", I)))
                            ''//On the first page add some links
                            If I = 1 Then
    
                                ''//Go to pages relative to this page
                                Doc.Add(New Paragraph(New Chunk("First Page", BlueFont).SetAction(New PdfAction(PdfAction.FIRSTPAGE))))
    
                                Doc.Add(New Paragraph(New Chunk("Next Page", BlueFont).SetAction(New PdfAction(PdfAction.NEXTPAGE))))
    
                                Doc.Add(New Paragraph(New Chunk("Prev Page", BlueFont).SetAction(New PdfAction(PdfAction.PREVPAGE)))) ''//This one does not make sense but is here for completeness
    
                                Doc.Add(New Paragraph(New Chunk("Last Page", BlueFont).SetAction(New PdfAction(PdfAction.LASTPAGE))))
    
                                ''//Go to a specific hard-coded page number
                                Doc.Add(New Paragraph(New Chunk("Go to page 5", BlueFont).SetAction(PdfAction.GotoLocalPage(5, New PdfDestination(0), writer))))
                            End If
                        Next
                        Doc.Close()
                    End Using
                End Using
            End Using
        End Sub
        Private Shared Sub ListPdfLinks()
    
            ''//Setup some variables to be used later
            Dim R As PdfReader
            Dim PageCount As Integer
            Dim PageDictionary As PdfDictionary
            Dim Annots As PdfArray
    
            ''//Open our reader
            R = New PdfReader(BaseFile)
            ''//Get the page cont
            PageCount = R.NumberOfPages
    
            ''//Loop through each page
            For I = 1 To PageCount
                ''//Get the current page
                PageDictionary = R.GetPageN(I)
    
                ''//Get all of the annotations for the current page
                Annots = PageDictionary.GetAsArray(PdfName.ANNOTS)
    
                ''//Make sure we have something
                If (Annots Is Nothing) OrElse (Annots.Length = 0) Then Continue For
    
                ''//Loop through each annotation
                For Each A In Annots.ArrayList
    
                    ''//I do not completely understand this but I think this turns an Indirect Reference into an actual object, but I could be wrong
                    ''//Anyway, convert the itext-specific object as a generic PDF object
                    Dim AnnotationDictionary = DirectCast(PdfReader.GetPdfObject(A), PdfDictionary)
    
                    ''//Make sure this annotation has a link
                    If Not AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK) Then Continue For
    
                    ''//Make sure this annotation has an ACTION
                    If AnnotationDictionary.Get(PdfName.A) Is Nothing Then Continue For
    
                    ''//Get the ACTION for the current annotation
                    Dim AnnotationAction = DirectCast(AnnotationDictionary.Get(PdfName.A), PdfDictionary)
    
                    ''//Test if it is a named actions such as /FIRST, /LAST, etc
                    If AnnotationAction.Get(PdfName.S).Equals(PdfName.NAMED) Then
                        Trace.Write("GOTO:")
                        If AnnotationAction.Get(PdfName.N).Equals(PdfName.FIRSTPAGE) Then
                            Trace.WriteLine(1)
                        ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.NEXTPAGE) Then
                            Trace.WriteLine(Math.Min(I + 1, PageCount)) ''//Any links that go past the end of the document should just go to the last page
                        ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.LASTPAGE) Then
                            Trace.WriteLine(PageCount)
                        ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.PREVPAGE) Then
                            Trace.WriteLine(Math.Max(I - 1, 1)) ''//Any links the go before the first page should just go to the first page
                        End If
    
    
                        ''//Otherwise see if its a GOTO page action
                    ElseIf AnnotationAction.Get(PdfName.S).Equals(PdfName.GOTO) Then
    
                        ''//Make sure that it has a destination
                        If AnnotationAction.GetAsArray(PdfName.D) Is Nothing Then Continue For
    
                        ''//Once again, not completely sure if this is the best route but the ACTION has a sub DESTINATION object that is an Indirect Reference.
                        ''//The code below gets that IR, asks the PdfReader to convert it to an actual page and then loop through all of the pages
                        ''//to see which page the IR points to. Very inneficient but I could not find a way to get the page number based on the IR.
    
                        ''//AnnotationAction.GetAsArray(PdfName.D) gets the destination
                        ''//AnnotationAction.GetAsArray(PdfName.D).ArrayList(0) get the indirect reference part of the destination (.ArrayList(1) has fitting options)
                        ''//DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference) turns it into a PRIndirectReference
                        ''//The full line gets us an actual page object (actually I think it could be any type of pdf object but I have not tested that).
                        ''//BIG NOTE: This line really should have a bunch more sanity checks in place
                        Dim AnnotationReferencedPage = PdfReader.GetPdfObject(DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference))
                        Trace.Write("GOTO:")
                        ''//Re-loop through all of the pages in the main document comparing them to this page
                        For J = 1 To PageCount
                            If AnnotationReferencedPage.Equals(R.GetPageN(J)) Then
                                Trace.WriteLine(J)
                                Exit For
                            End If
                        Next
                    End If
                Next
            Next
        End Sub
    
        Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            CreateSamplePdf()
            ListPdfLinks()
            Me.Close()
        End Sub
    End Class
    

提交回复
热议问题