Unable to verify digital signature using Apache PDFBOX

Deadly 提交于 2019-12-03 15:15:54

Your prime problem is that there are multiple types of PDF signatures differing in the format of the signature container and in what actually are the signed bytes. Your BC code, on the other hand, can verify merely naked signature byte sequences which are contained in the afore-mentioned signature containers.

Interoperable signature types

As the header already says, the following list contains "interoperable signature types" which are more or less strictly defined. The PDF specification specifies a way to also include completely custom signing schemes. But let us assume we are in an interoperable situation. The the collection of signature types burns down to:

  • adbe.x509.rsa_sha1 defined in ISO 32000-1 section 12.8.3.2 PKCS#1 Signatures; the signature value Contents contain a DER-encoded PKCS#1 binary data object; this data object is a fairly naked signature, in case of RSA an encrypted structure containing the padded document hash and the hash algorithm.

  • adbe.pkcs7.sha1 defined in ISO 32000-1 section 12.8.3.3 PKCS#7 Signatures; the signature value Contents contain a DER-encoded PKCS#7 binary data object; this data object is a big container object which can also contain meta-information, e.g. it may contain certificates for building certificate chains, revocation information for certificate revocation checks, digital time stamps to fix the signing time, ... The SHA1 digest of the document’s byte range shall be encapsulated in the PKCS#7 SignedData field with ContentInfo of type Data. The digest of that SignedData shall be incorporated as the normal PKCS#7 digest.

  • adbe.pkcs7.detached defined in ISO 32000-1 section 12.8.3.3 PKCS#7 Signatures; the signature value Contents contain a DER-encoded PKCS#7 binary data object, see above. The original signed message digest over the document’s byte range shall be incorporated as the normal PKCS#7 SignedData field. No data shall be encapsulated in the PKCS#7 SignedData field.

  • ETSI.CAdES.detached defined in ETSI TS 102 778-3 and will become integrated in ISO 32000-2; the signature value Contents contain a DER-encoded SignedData object as specified in CMS; CMS signature containers are close relatives to PKCS#7 signature containers, see above. This essentially is a differently profiled and stricter defined variant of adbe.pkcs7.detached.

  • ETSI.RFC3161 defined in ETSI TS 102 778-4 and will become integrated in ISO 32000-2; the signature value Contents contain a TimeStampToken as specified in RFC 3161; time stamp tokens again are a close relative to PKCS#7 signature containers, see above, but they contain a special data sub-structure harboring the document hash, the time of the stamp creation, and information on the issuing time server.

I would propose studying the specifications I named and the documents referenced from there, mostly RFCs. Based on that knowledge you can easily find the appropriate BouncyCastle classes to analyze the different signature Contents.

A working example to validate adbe.pkcs7.detached PDF signatures (the most common PDF signatures) with Apache PDFBox 1.8.x:

public class PDFBoxValidateSignature {
    public static void main(String[] args) throws Exception {
        File signedFile = new File("sample-signed.pdf");
        // We load the signed document.
        PDDocument document = PDDocument.load(signedFile);
        List<PDSignature> signatureDictionaries = document.getSignatureDictionaries();
        // Then we validate signatures one at the time.
        for (PDSignature signatureDictionary : signatureDictionaries) {
            // NOTE that this code currently supports only "adbe.pkcs7.detached", the most common signature /SubFilter anyway.
            byte[] signatureContent = signatureDictionary.getContents(new FileInputStream(signedFile));
            byte[] signedContent = signatureDictionary.getSignedContent(new FileInputStream(signedFile));
            // Now we construct a PKCS #7 or CMS.
            CMSProcessable cmsProcessableInputStream = new CMSProcessableByteArray(signedContent);
            CMSSignedData cmsSignedData = new CMSSignedData(cmsProcessableInputStream, signatureContent);
            SignerInformationStore signerInformationStore = cmsSignedData.getSignerInfos();
            Collection signers = signerInformationStore.getSigners();
            CertStore certs = cmsSignedData.getCertificatesAndCRLs("Collection", (String) null);
            Iterator signersIterator = signers.iterator();
            while (signersIterator.hasNext()) {
                SignerInformation signerInformation = (SignerInformation) signersIterator.next();
                Collection certificates = certs.getCertificates(signerInformation.getSID());
                Iterator certIt = certificates.iterator();
                X509Certificate signerCertificate = (X509Certificate) certIt.next();
                // And here we validate the document signature.
                if (signerInformation.verify(signerCertificate.getPublicKey(), (String) null)) {
                    System.out.println("PDF signature verification is correct.");
                    // IMPORTANT: Note that you should usually validate the signing certificate in this phase, e.g. trust, validity, revocation, etc. See http://www.nakov.com/blog/2009/12/01/x509-certificate-validation-in-java-build-and-verify-chain-and-verify-clr-with-bouncy-castle/.
                } else {
                    System.out.println("PDF signature verification failed.");
                }
            }
        }
    }
}

Not sure if there is an official example for this, I've checked in the official examples for PDFBox 1.8.4 and I didn't find anything.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!