How to Digitally Sign a Dynamically Created PDF Document Using PDFBox?

孤人 提交于 2019-11-30 10:28:21

While initially these hints were presented as comments to the original question, they now merit to be formulated as an answer:

Code issues

While there is too much code to review and fix without spending a considerable amount of time, and while the original absence of a sample PDF was a hindrance, a quick scan of the code revealed some issues:

  • The appendRawCommands(XXXFormStream.createOutputStream(), YYY) calls quite likely cause problems with PDFBox: creating output streams for the same form more than once may be an issue, and also switching back and forth between the forms.

  • Furthermore there does not seem to be a whitespace between the multiple strings written to the same stream giving rise to unknown Qq operators. Furthermore the appendRawCommands method uses UTF-8 which is foreign to PDF.

  • The generateSignedDocument most likely does quite a lot of damage as it assumes it can work with PDFs as if they were text files. That in general is not the case.

Result PDF issues

The sample result PDF eventually provided by the OP allows to pinpoint some actually realized issues:

  • Comparing the bytes of both documents (Report_08_05_23.pdf and Signed_Report_08_05_23.pdf) one finds that there are many unwanted changes, at first glance especially the replacement of certain bytes by question marks. This is due to using ByteArrayOutputStream.toString() to easily operate on the document and eventually changing it back into a byte[].

    E.g. cf. the JavaDocs of ByteArrayOutputStream.toString()

    * <p> This method always replaces malformed-input and unmappable-character
    * sequences with the default replacement string for the platform's
    * default character set. The {@linkplain java.nio.charset.CharsetDecoder}
    * class should be used when more control over the decoding process is
    * required.
    

    Certain byte values do not represent characters in the platform's default character set and therefore are transformed to the Unicode Replacement Character and in the final transformation into a byte[] become 0x3f (ASCII code for the question mark). This change kills compressed stream contents, both of content streams and image streams.

    To fix this, one has to work with byte and byte[] operations instead of String operations here.

  • The stream 8 0 references itself in its XObject resources which might make any pdf viewer throw up. Please refrain from such circularity.

Signature Container issues

The signature does not verify. Thus, it also is reviewed.

  • Inspecting the signature container one can see that it is wrong: In spite of the signature being adbe.pkcs7.detached, the signature container embeds data. Looking at the code the reason becomes clear:

    CMSSignedData sigData = generator.generate(msg, true);
    

    The true parameter asks BC to embed the msg data.

  • Having started to look at the signing code, another issue becomes visible: The msg data above are not merely a digest, they already are a signature:

    Signature signature = Signature.getInstance(algorithm, BC);
    signature.initSign(privateKey);
    signature.update(docForSign.getBytes());
    CMSTypedData msg = new CMSProcessableByteArray(signature.sign());
    

which is wrong as the later created SignerInfoGenerator is used to create the actual signature.

Edit: After the issues mentioned before have been fixed or at least worked-around, the signature is still not accepted by the Adobe Reader. Thus, another look at the code and:

Hash value calculation issue

The OP constructs this ByteRange value

String finalByteRange = "/ByteRange [0 " + offsetContentStart + " " + offsetContentEnd + " " + secondPartLength + "]";

and later sets

String docFirstPart = docString.substring(0, offsetContentStart + 1);
String docSecondPart = docString.substring(offsetContentEnd - 1);

The + 1 and - 1 are intended to make these document parts also include the < and > enveloping the signature bytes. But the OP also uses these strings to construct the signed data:

String docForSign = docFirstPart.concat(docSecondPart);

This is wrong, the signed bytes do not contain the < and >. Thus, the hash value later on calculated also is wrong and Adobe Reader has good reasons to assume the document has been manipulated.

That been said, there also are other issues bound to come up every once in a while:

Offset and length updating issues

The OP inserts the byte range to be like this:

String interimByteRange = "/ByteRange [0 " + offsetContentStart + " " + offsetContentEnd + " " + secondPartLength + "]";
int byteRangeLengthDifference = interimByteRange.length() - initByteRange.length();
offsetContentStart = offsetContentStart + byteRangeLengthDifference;
offsetContentEnd = offsetContentEnd + byteRangeLengthDifference;
String finalByteRange = "/ByteRange [0 " + offsetContentStart + " " + offsetContentEnd + " " + secondPartLength + "]";
byteRangeLengthDifference += interimByteRange.length() - finalByteRange.length();
//Replace the ByteRange
docString = docString.replace(initByteRange, finalByteRange);

Every one in a while offsetContentStart or offsetContentEnd will be slightly below some 10^n and slightly above afterwards. The line

byteRangeLengthDifference += interimByteRange.length() - finalByteRange.length();

tries to make up for this, but finalByteRange (which eventually is inserted into the document) still contains uncorrected values.

In a similar fashion the representation of the xref start inserted like this

docString = docString.substring(0, startxrefOffset).concat("startxref\n".concat(Integer.toString(xrefOffset))).concat("\n%%EOF\n");

may also be longer than before which makes the byte range (calculated beforehand) not cover the whole document.

Furthermore finding offsets of the relevant PDF objects using text searches of the whole document

offsetContentStart = (documentOutputStream.toString().indexOf("Contents <") + 10 - 1);
offsetContentEnd = (documentOutputStream.toString().indexOf("000000>") + 7);
...
int xrefOffset = docString.indexOf("xref");
...
int startxrefOffset = docString.indexOf("startxref");

will fail for generic documents. E.g. if there already are previous signatures in the document, quite likely the wrong indices will be identified like this.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!