Confusion about current transformation matrix in a PDF

被刻印的时光 ゝ 提交于 2020-01-05 03:49:23

问题


I am having some confusions about the current transformation matrix (CTM) in PDFs. For page 5 in this PDF, I have examined the Token Stream (http://pastebin.com/k6g4BGih) and that shows the last cm operation before the curve (c) commands sets the transfomration matrix to COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}. The full output is at http://pastebin.com/9XaPQQm9 .

Next I used the following set of codes to extract the line and curve commands from the same page following a code @mkl provided in a related SO question

  1. Main class: http://pastebin.com/htiULanR
  2. Helper classes:

    a. Class that extends PDFGraphicsStreamEngine: http://pastebin.com/zL2p75ha

    b. Path: http://pastebin.com/d3vXCgnC

    c. Subpath: http://pastebin.com/CxunHPiZ

    d. Segment: http://pastebin.com/XP1Dby6U

    e. Rectangle: http://pastebin.com/fNtHNtws

    f. Line: http://pastebin.com/042cgZBp

    g. Curve: http://pastebin.com/wXbXZdqE

In that code, I printed the CTM using getGraphicsState().getCurrentTransformationMatrix() inside the curveTo() method that is overridden from PDFGraphicsStreamEngine class. That shows the CTM as [0.1,0.0,0.0,0.1,0.0,0.0]. So my questions are:

  1. Shouldn't these two CTMs be the same?

  2. Both these CTMs have scaling operations: the first one scales with a factor of 10 and the second one scales with a factor of 0.1. If I ignore the scaling, I can create an SVG which looks fairly close to the original PDF. But I am confused why that should happen. Do I need to consider all transformation matrices before the path instead of the last one?


回答1:


First of all: You say

the last cm operation before the curve (c) commands sets the transfomration matrix to COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}.

This is not correct, cm does not set the transformation matrix to the parameter values but it multiplies the matrix parameter and the former current transformation matrix and sets the result as the new current transformation matrix, a process also called concatenation. Thus:

  1. Shouldn't these two CTMs be the same?

No, because cm doesn't set, it concatenates!

Furthermore, the current transformation matrix (and all other graphics state values!) is not only changed by the explicit setter or concatenator instructions but also the restore-state instruction which you ignore currently. Thus:

  1. Do I need to consider all transformation matrices before the path instead of the last one?

You may have to consider more than the last, but only those not undone by graphics state restoration.


Let's look at your example document...

When you want to keep track of the current transformation matrix, you have to inspect both the cm and the q/Q instructions. In case of your page 5 the content stream with focus on those instructions up to the first c curve instruction looks like this:

q 0.1 0 0 0.1 0 0 cm
q
q 10 0 0 10 0 0 cm BT
[...large text object...]
ET Q
Q
q 
[...clip path definition...]
q 10 0 0 10 0 0 cm BT 
[...small text object...]
ET Q
Q
q 
[...new clip path definition...]
0.737761 w
1 i
2086.54 2327.82 m
2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c 

Assuming a starting identity transformation matrix this implies the following flow of currently current transformation matrix and the current transformation matrices in the graphics stack:

CTM: 1 0 0 1 0 0

Stack: empty

q

CTM: 1 0 0 1 0 0

Stack: 1 0 0 1 0 0

0.1 0 0 0.1 0 0 cm

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0

q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

10 0 0 10 0 0 cm

CTM: 1 0 0 1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

BT
[...large text object...]
ET Q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

Q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0

q 

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

[...clip path definition...]
q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

10 0 0 10 0 0 cm

CTM: 1 0 0 1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

BT 
[...small text object...]
ET Q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

Q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0

q 

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

[...new clip path definition...]
0.737761 w
1 i
2086.54 2327.82 m
2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c 

Thus, PDFBox is correct when you observe:

I printed the CTM using getGraphicsState().getCurrentTransformationMatrix() inside the curveTo() method that is overridden from PDFGraphicsStreamEngine class. That shows the CTM as [0.1,0.0,0.0,0.1,0.0,0.0]



来源:https://stackoverflow.com/questions/38005345/confusion-about-current-transformation-matrix-in-a-pdf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!