itext correctly get each image position within page or document

天大地大妈咪最大 提交于 2019-12-22 18:51:14

问题


Experimenting with itext I am extracting both text and images from pdf files. For my purpose I build an html file using the text and images. The goal is to place the extracted images throughout the text rather than placing them at the end as I do currently.

After some research itext renderInfo.getImageCTM() appears to be just what I need, however the coordinates returned do not resemble the positions of some of the images when comparing to what's being displayed in adobe reader.

My MyImageRenderListener class has this

Matrix matrix = renderInfo.getImageCTM();
float x = matrix.get(Matrix.I31);
float y = matrix.get(Matrix.I32);

Which should give me the XY, top left hand corner position of each image, if I've understood it correctly.

Here is the pdf file I will be referring to

It's a 4 page PDF that contains either one or two images per page.

PAGE       X        Y
Page1      33.0     358.5

Page2 1st  321.7    419.9  
Page2 2nd  41.2     182    

Page3 1st  43.1     307.5  
Page3 2nd  417      58.5   

Page4      292.5    457.5  

When comparing the coordinates output to the actual document the numbers do not seem to make sense. For example, On page3, the second image Y (58.5) is lower than the first image Y (307.5) on the same page. That would put image 2 on page 3 before image 1??

Also none of the Y and some of the X coordinates appear to be correct when comparing to the layout in any reader.

What am I getting wrong or miss understanding please?

来源:https://stackoverflow.com/questions/26426345/itext-correctly-get-each-image-position-within-page-or-document

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!