Insert PieceInfo in merged document with ITextSharp

问题

I have a process that merge several PDFs into a single PDF. This is working great.

At the time of the merge, I want to add a PieceInfo at page level to track the documents that were included into that merged file.

Let's say I have 3 documents in this order: Fester.pdf (2 pages), Gomez.pdf (2 pages) and Lurch.pdf (1 page). After the merge I will have 5 pages and each page would have a PieceInfo with the file name that was originated from. This way, if I go to page 4, I will know the page was generated from Gomez.pdf

During my search, I found this post: Insert hidden digest in pdf using iText library and I tried to implement the same in my process. The suggestion works great but I could not figure out how to store the information per page.

Here is my code:

public static byte[] MergeDocuments(DocumentCollection myCollection)
{
    PdfImportedPage importedPage = null;

    // Merged the document streams
    using (MemoryStream stream = new MemoryStream())
    {
        // Create the iTextSharp document
        iTextSharp.text.Document pdfDoc = new iTextSharp.text.Document();

        // Create the PDF writer that listened to the document
        PdfCopy pdfCopy = new PdfCopy(pdfDoc, stream);
        if (pdfDoc != null && pdfCopy != null)
        {
            // Open the document and load content
            pdfDoc.Open();

            //Dictionary Entries
            PdfName appName = new PdfName("MyKey");
            PdfName dataName = new PdfName("Hash");

            //Class to add and retrieve the PieceInfo data
            DocumentPieceInfo dpi = new DocumentPieceInfo();

            //Loop through my collection. The document class has the BinaryFile and FileName
            foreach (Document doc in myCollection)
            {
                PdfReader reader = new PdfReader(doc.FileBinary);
                if (reader != null)
                {
                    int nPage = reader.NumberOfPages;
                    for (int n = 0; n < nPage; n++)
                    {
                        //Trying to add the PieceInfo
                        dpi.addPieceInfo(pdfCopy, appName, dataName, new PdfString(string.Format("Info Doc: {0}", doc.FileName)));
                        importedPage = pdfCopy.GetImportedPage(reader, n + 1);
                        pdfCopy.AddPage(importedPage);
                    }
                    // Close the reader
                    reader.Close();
                }
            }

            if (pdfCopy != null)
                pdfCopy.Close();

            if (pdfDoc != null)
                pdfDoc.Close();

            byte[] arrOutput = stream.ToArray();
            return arrOutput;

        }
    }
    return null;
}

And a small change to MKL solution, changing the input to a PDFCopy:

public void addPieceInfo(PdfCopy reader, PdfName app, PdfName name, PdfObject value)
    {
        //PdfDictionary catalog = reader.getCatalog();
        PdfDictionary pieceInfo = reader.ExtraCatalog.GetAsDict(PIECE_INFO);
        if (pieceInfo == null)
        {
            pieceInfo = new PdfDictionary();
            reader.ExtraCatalog.Put(PIECE_INFO, pieceInfo);
        }

        PdfDictionary appData = pieceInfo.GetAsDict(app);
        if (appData == null)
        {
            appData = new PdfDictionary();
            pieceInfo.Put(app, appData);
        }

        PdfDictionary privateData = appData.GetAsDict(PRIVATE);
        if (privateData == null)
        {
            privateData = new PdfDictionary();
            appData.Put(PRIVATE, privateData);
        }

        appData.Put(LAST_MODIFIED, new PdfDate());
        privateData.Put(name, value);
    }

The code above is adding the pieceinfo in the last page only :(

Does the page PdfImportedPage object have a way to get the catalog?

How can I include this information per page level during my merge process? After that, how can I get the pieceInfo from the pages? Just looping through the pages?

回答1:

Please be aware that /PieceInfo will be deprecated in ISO-32000-2 (aka PDF 2.0). As an alternative, you can create your own key to add your own custom data. This is explained in my answer to the question itext how to check if giant string is present on the pdf page.

You are asking Does the page PdfImportedPage object have a way to get the catalog?

This is not the right question to ask. If you study my answer well, you'll discover that you need access to the page dictionary. You can add a /PieceInfo entry (or your custom entry) to this page dictionary and then later retrieve it.

Take a look at the CustomPageDictKeyMerge:

public void createPdf(String filename) throws IOException, DocumentException {
    PdfName marker = new PdfName("ITXT_PageMarker");
    List<PdfReader> readers = new ArrayList<PdfReader>();
    readers.add(new PdfReader(SRC1));
    readers.add(new PdfReader(SRC2));
    readers.add(new PdfReader(SRC3));
    Document document = new Document();
    PdfCopy copy = new PdfCopy(document, new FileOutputStream(filename));
    document.open();
    int counter = 0;
    int n;
    PdfImportedPage importedPage;
    PdfDictionary pageDict;
    for (PdfReader reader : readers) {
        counter++;
        n = reader.getNumberOfPages();
        for (int p = 1; p <= n; p++) {
            pageDict = reader.getPageN(p);
            pageDict.put(marker, new PdfString(String.format("Page %s of document %s", p, counter)));
            importedPage = copy.getImportedPage(reader, p);
            copy.addPage(importedPage);
        }
    }
    // close the document
    document.close();
    for (PdfReader reader : readers) {
        reader.close();
    }
}

In this example, we add a special marker to the page dictionary before we import the page. As a result, this marker will be added to the merged document:

Take a look at the CustomPageDictKeyCreate example to find out how to retrieve these custom markers:

public void check(String filename) throws IOException {
    PdfReader reader = new PdfReader(filename);
    PdfDictionary pagedict;
    for (int i = 1; i < reader.getNumberOfPages(); i++) {
        pagedict = reader.getPageN(i);
        System.out.println(pagedict.get(new PdfName("ITXT_PageMarker")));
    }
    reader.close();
}

Please make sure that you use a second class name for your custom key. iText has registered the prefix ITXT with ISO for its custom second class keys. This prefix makes sure that different companies don't use the same key for different purposes. All keys starting with ITXT can easily be identified as keys created by iText Group. ISO keeps track of all these prefixes to avoid duplicates. Registration of a prefix with ISO is free of charge.

来源：https://stackoverflow.com/questions/34617914/insert-pieceinfo-in-merged-document-with-itextsharp

标签

pdf

merge

itextsharp