PDF Signature digest | 易学教程

问题

I have a quick question about calculating the digest of a PDF document to use for a digital signature (somewhat related to one of my earlier questions, I'm trying to figure out why you would need to know a client's certificate to create the correct digest). In Adobe's documentation about the PDF format the following is specified:

A byte range digest shall be computed over a range of bytes in the file, that shall be indicated by the ByteRange entry in the signature dictionary. This range should be the entire file, including the signature dictionary but excluding the signature value itself (the Contents entry).

So at this point things seem fairly simple, just digest everything except the /Contents entry in the /Sig dictionary. The actual data in the /Contents entry is specified as followed:

For public-key signatures, Contents should be either a DER-encoded PKCS#1 binary data object or a DER-encoded PKCS#7 binary data object.

So still no problems, I can (probably) generate the digest, reserve space for the /Contents entry and attach this PKCS#7 object later on. The confusion starts when I read the following:

Revocation information is a signed attribute, which means that the signing software must capture the revocation information before signing. A similar requirement applies to the chain of certificates. The signing software must capture and validate the certificate's chain before signing.

So the thing I'm not quite getting: Apparently the /Contents entry (containing the certificate and signed digest) is not digested, yet the chain of certificates is a signed attribute (and thus needs to be digested?).

I would appreciate it if someone could further specify exactly what is digested, and perhaps better explain the signed attributes to me. The main question that I want to answer is: Can I actually create a signable digest without knowing someone's certificate beforehand? (I'm working with a pkcs7 detached signature)

回答1:

In short:

Can I actually create a signable digest without knowing someone's certificate beforehand?

In case of SubFilter ETSI.CAdES.detached or adbe.pkcs7.detached you can create the document digest without knowing someone's certificate beforehand.

You usually, though, have to know the signer certificate before starting to generate the CMS signature container to embed into the PDF.

In detail:

(Beware, the following is somewhat simplified.)

I can (probably) generate the digest, reserve space for the /Contents entry and attach this PKCS#7 object later on.

If you first reserve space and thereafter generate the digest, this indeed is how things are done.

The confusion starts when I read the following:

Revocation information is a signed attribute, which means that the signing software must capture the revocation information before signing. A similar requirement applies to the chain of certificates. The signing software must capture and validate the certificate's chain before signing.

So the thing I'm not quite getting: Apparently the /Contents entry (containing the certificate and signed digest) is not digested, yet the chain of certificates is a signed attribute (and thus needs to be digested?).

I would appreciate it if someone could further specify exactly what is digested, and perhaps better explain the signed attributes to me.

The main fact one has to be aware of is that in case of PKCS#7/CMS signature containers signing usually does not merely include one hash calculation but at least two!

The first hash, the document hash, is indeed calculated for the entire file, including the signature dictionary but excluding the signature value itself (the Contents entry) (you might want to read this answer for more details).

But this is not the hash immediately used when applying the signature algorithm.

During the generation of the PKCS#7/CMS signature container (unless in its most primitive form) you create a structure called "signed attributes".

You fill this structure with multiple attributes (name-value-pairs), among them the already calculated document hash but also others, e.g. the Adobe-style revocation information you read about.

When you have finished creating that structure, you hash this structure and generate a signature for it.

You then can put together the PKCS#7/CMS signature container using these signed attributes, the signature, and some more information not signed by this signature, e.g. certificates, signature time stamps, ...

For more details concerning the signature container read this answer.

Finally you embed this signature container into the reserved space in the PDF.

The main question that I want to answer is: Can I actually create a signable digest without knowing someone's certificate beforehand? (I'm working with a pkcs7 detached signature)

In case of SubFilter ETSI.CAdES.detached or adbe.pkcs7.detached you can create the document digest without knowing someone's certificate beforehand.

Depending on the CMS signature profile, though, you usually have to know the signer certificate before starting to generate the signature container because many profiles require the presence of a signed attribute referencing the signer certificate.

Clarifications:

The OP asked some follow-up questions in a comment:

1.: One of the signed attributes is the document hash(without the /contents), so if I understand correctly this is the unsigned hash?

As the "signed attributes" eventually are hashed and signed, that document hash therein is not immediately, directly signed but it is indirectly signed as part of this structure of attributes. So I wouldn't call it unsigned...

In the end when the user really generates a signature, he signs the hash of the PKCS#7 object?

No, the hash of the "Signed attributes" structure which is only a part of the PKCS#7 object, not all of it. There are multiple parts of the PKCS#7/CMS object which are unsigned.

Does the /Contents entry still have a PKCS#7 object that's actually readable for us? (To extract certificates etc for verification)

The Contents entry does contain a full-fledged PKCS#7/CMS signature container object as a binary string. Thus, yes, you can read it (by reading the value of that binary string) and (if you have code that knows how to parse such a signature container) extract information from it.

Beware, though, the signature container may not contain all data required for verification: E.g., if you verify using the chain (not shell) validation model, you might have to extract the signing time from the respective PDF signature dictionary entry.

When verifying a signature, do we simply extract the embedded PKCS#7 object, recalculate the digest, recalculate the digest of the PKCS#7 object and verify this against the signature using the certificate we get from the PKCS#7 object?

You obviously also have to calculate the digest of the signed PDF byte ranges and compare that value with the signed attribute containing the original document digest.(You might have meant that by recalculate the digest.)

As mentioned in the answer to 3, you might have to retrieve additional information from the PDF for use in the PKCS#7 verification.

Furthermore you say the certificate we get from the PKCS#7 object - please be aware that the PKCS#7/CMS signature container may contain multiple certificates. You have to find the correct one. The CMS SignerInfo SignerIdentifier and the ESS signed attributes shall be used for that.

Furthermore you also have to verify validity and trust of the signer certificate.