PdfContentStreamEditor rotating image on PDF file

余生颓废 提交于 2020-01-06 06:35:20

问题


I have what I hope is an easy question. I'm trying to use iTextSharp to modify some PDF files, however it seems that the XMP metadata that iTextSharp puts at the end of the files is ruining the layout of the PDF files (and I'm not very conversant in the PDF format to understand at all why).

You can see from the two images above that the document appears to have been rotated. From looking at the PDF files as binary differences however, the only thing different appears to be some XMP metadata at the end of the files

I've tried opening the files in several PDF viewers (Sumatra PDF, Edge Browser and Adobe Acrobat) and all show the same weirdness.

I guess I have two questions: a) How can the PDF file be so altered from just having XMP meteadata at the end of the file? b) How can I make iTextSharp not produce this output? (iTextSharp only seems to do this when I Add/Edit content, and not if I just strip out Javascript or similar)

<EDIT 1>
The code that I'm using for the iTextSharp is the PdfContentStreamEditor (verbatim) from the post here: https://stackoverflow.com/a/35915789/2535822
</EDIT 1>
<EDIT 2>
Ok.. it seems that it's not the XMP Metadata. I got rid of that by using:

pdfStamper.XmpMetadata = new byte[0];

However there is still a bunch of extra data placed at the end of the file

2 0 obj
<</Producer(PDFCreator 2.5.2.5233; modified using iTextSharp’ 5.5.13 ©2000-2018 iText Group NV \(AGPL-version\))/CreationDate(D:20171206173510+10'30')/ModDate(D:20180325144710+11'00')/Title(þÿ
endobj
404 0 obj
<</Length 0/Type/Metadata/Subtype/XML>>stream

endstream
endobj
405 0 obj
<</Length 3638/Filter/FlateDecode>>stream
xœÍZmÅ/6ÒZ2ÁÆ€
....

</EDIT 2>


回答1:


You have indeed found a bug in the PdfContentStreamEditor I used in this answer while the other issue requires one to know how to disable a special feature or quirk (depending on the circumstances) of iText.

Rotation of the content

This part deals with the rotation of content in the sample document PHA-Pro 8 - File.pdf provided by the OP.

As you already have seen yourself, the rotation issue appears connected with the fact that the page rotation of the page in question is not 0.

Indeed, the iText PdfStamper has a feature which in case of rotated pages automatically rotates additions one applies to the OverContent or UnderContent. This feature can be quite handy if you want to add upright content to the page without having to apply rotation yourself to make it upright. In case of the PdfContentStreamEditor, though, all coordinates we receive from the existing content already have the applicable rotation factored in.

Thus, we need to disable this feature. One can do so using the PdfStamper property RotateContents:

using (PdfReader pdfReader = new PdfReader(source))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write), (char)0, true))
{
    pdfStamper.RotateContents = false;
    PdfContentStreamEditor editor = new PdfContentStreamEditor();

    for (int i = 1; i <= pdfReader.NumberOfPages; i++)
    {
        editor.EditPage(pdfStamper, i);
    }
}

Scrambling of text

This part deals with the scrambling of text in the sample document AS62061-2006.pdf provided by the OP.

You have found a bug in the PdfContentStreamEditor. Its Write method contains this loop:

foreach (PdfObject pdfObject in operands)
{
    pdfObject.ToPdf(canvas.PdfWriter, canvas.InternalBuffer);
    canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
}

It should instead be

foreach (PdfObject pdfObject in operands)
{
    pdfObject.ToPdf(null, canvas.InternalBuffer);
    canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
}

If one presents the PdfWriter to the ToPdf method of a PdfString and the PdfWriter uses encryption, the string contents are getting encrypted. But here the string is written to a stream, and in that case not the individual string must be encrypted but instead eventually the whole stream.

This applies to the PDF provided by the OP because

  • the PDF is encrypted using the default password and
  • the OP edited using a PdfStamper in append mode which encrypts the additions using the same password as the original file.

With the original code, the result looks like this:

With the fixed code, it looks like this:




回答2:


I can answer your second question. The metadata you are trying to remove is not supposed to be removed. The DLL of the AGPL version that you are using will add that metadata, no matter what you do with code. You will not be able to remove it with iText as it is a direct violation of their licence terms. Please refer to : https://itextpdf.com/AGPL

You must prominently mention iText and include the iText copyright and AGPL license in output file metadata.



来源:https://stackoverflow.com/questions/49472242/pdfcontentstreameditor-rotating-image-on-pdf-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!