问题
I am having some confusions about the current transformation matrix (CTM) in PDFs. For page 5 in this PDF, I have examined the Token Stream (http://pastebin.com/k6g4BGih) and that shows the last cm
operation before the curve (c)
commands sets the transfomration matrix to COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}
. The full output is at http://pastebin.com/9XaPQQm9 .
Next I used the following set of codes to extract the line and curve commands from the same page following a code @mkl provided in a related SO question
- Main class: http://pastebin.com/htiULanR
Helper classes:
a. Class that extends
PDFGraphicsStreamEngine
: http://pastebin.com/zL2p75hab.
Path
: http://pastebin.com/d3vXCgnCc.
Subpath
: http://pastebin.com/CxunHPiZd.
Segment
: http://pastebin.com/XP1Dby6Ue.
Rectangle
: http://pastebin.com/fNtHNtwsf.
Line
: http://pastebin.com/042cgZBpg.
Curve
: http://pastebin.com/wXbXZdqE
In that code, I printed the CTM using getGraphicsState().getCurrentTransformationMatrix()
inside the curveTo()
method that is overridden from PDFGraphicsStreamEngine
class. That shows the CTM as [0.1,0.0,0.0,0.1,0.0,0.0]
. So my questions are:
Shouldn't these two CTMs be the same?
Both these CTMs have scaling operations: the first one scales with a factor of 10 and the second one scales with a factor of 0.1. If I ignore the scaling, I can create an SVG which looks fairly close to the original PDF. But I am confused why that should happen. Do I need to consider
all transformation matrices before the path
instead of the last one?
回答1:
First of all: You say
the last
cm
operation before thecurve (c)
commands sets the transfomration matrix toCOSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}
.
This is not correct, cm does not set the transformation matrix to the parameter values but it multiplies the matrix parameter and the former current transformation matrix and sets the result as the new current transformation matrix, a process also called concatenation. Thus:
- Shouldn't these two CTMs be the same?
No, because cm doesn't set, it concatenates!
Furthermore, the current transformation matrix (and all other graphics state values!) is not only changed by the explicit setter or concatenator instructions but also the restore-state instruction which you ignore currently. Thus:
- Do I need to consider all transformation matrices before the path instead of the last one?
You may have to consider more than the last, but only those not undone by graphics state restoration.
Let's look at your example document...
When you want to keep track of the current transformation matrix, you have to inspect both the cm and the q/Q instructions. In case of your page 5 the content stream with focus on those instructions up to the first c curve instruction looks like this:
q 0.1 0 0 0.1 0 0 cm
q
q 10 0 0 10 0 0 cm BT
[...large text object...]
ET Q
Q
q
[...clip path definition...]
q 10 0 0 10 0 0 cm BT
[...small text object...]
ET Q
Q
q
[...new clip path definition...]
0.737761 w
1 i
2086.54 2327.82 m
2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c
Assuming a starting identity transformation matrix this implies the following flow of currently current transformation matrix and the current transformation matrices in the graphics stack:
CTM: 1 0 0 1 0 0
Stack: empty
q
CTM: 1 0 0 1 0 0
Stack: 1 0 0 1 0 0
0.1 0 0 0.1 0 0 cm
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0
q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0
q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0
10 0 0 10 0 0 cm
CTM: 1 0 0 1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0
BT
[...large text object...]
ET Q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0
Q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0
q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0
[...clip path definition...]
q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0
10 0 0 10 0 0 cm
CTM: 1 0 0 1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0
BT
[...small text object...]
ET Q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0
Q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0
q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0
[...new clip path definition...]
0.737761 w
1 i
2086.54 2327.82 m
2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c
Thus, PDFBox is correct when you observe:
I printed the CTM using
getGraphicsState().getCurrentTransformationMatrix()
inside thecurveTo()
method that is overridden fromPDFGraphicsStreamEngine
class. That shows the CTM as[0.1,0.0,0.0,0.1,0.0,0.0]
来源:https://stackoverflow.com/questions/38005345/confusion-about-current-transformation-matrix-in-a-pdf