Remove XMP Metadata on PDF/A

问题

Is there a way wherein we can remove XMP metadata on a PDF/A Document without removing the PDF/A standardization?

I found that using

PdfReader reader = new PdfReader(src);
PdfDictionary dict = reader.getCatalog();
dict.remove(PdfName.METADATA);
dict.remove(PdfName.PROPERTIES);
reader.removeUnusedObjects();

Removes both XMP and PDF/A. Is there a way to remove the XMP while retaining the standard or reintroducing PDF/A into the processed document?

Thanks.

回答1:

You can't remove the XMP information in a PDF/A document; as you have found that will automatically invalidate it as a PDF/A as well. However, the amount of information you need to retain in the XMP container is minimal.

It is described in this technical note: http://www.pdfa.org/publication/technical-note-tn0003-metadata-in-pdfa-1/

Basically, it boils down to the fact that you need to retain the PDF/A identification and conformance level; everything else can be discarded. Because we're talking XMP, you have a number of possibilities. One is to go through a PDF library and deal with it that way. But the second and potentially quickest and easiest is to use a library that supports reading/writing XMP in PDF, and simply replace the XMP packet in the file with one that only has the information you need.

If you do this properly (without hurting the PDF file), this shouldn't invalidate the PDF or it's PDF/A compliance status (though I would surely advise to properly test resulting PDF files using a PDF/A validator to make sure you did it right before using this in a production workflow).

There is one caveat though and it's also mentioned in the technical note pointed to above.

PDF/A-1 does not require a conforming document to contain any entries in the document information dictionary at all. Nevertheless, whenever those Info en- tries specified in the PDF 1.4 reference (except for the Trapped entry) are present, there must be an equivalent entry in the document’s Metadata, and both must match according to the provisions of PDF/A-1.

So... if your document contains document properties, you either have to remove those or match them in the XMP packet.

来源：https://stackoverflow.com/questions/31529496/remove-xmp-metadata-on-pdf-a

标签

java

pdf