pdfbox 2.0.2 > How to combine the TextPosition coordinates and Graphics GeneralPath coordinates into the same quadrant

故事扮演 提交于 2019-12-12 04:59:10

问题


As a newbie of pdfbox user, I plan to extract data in a table, but tables with special formats, say with merged column headers should be processed with the help of table's borderlines. Therefore, the coordinates of the text and at least the table's horizontal borderlines should be extracted.

In order to extract the text from the table, I used PDFTextStripper to get the list of TextPosition objects; in order to extract the horizontal lines from the same page, I used PDFGraphicsStreamEngine to extract the list of stroked GeneralPath objects, and inside the stroked GeneralPath object, there is the corresponding Rectangle2D object representing the line (height = 0). But it seems that the aforementioned coordinates of TextPosition objects and the coordinates of GeneralPath objects are not in the same quadrant but with different Y-axis ray starting from the same origin.

According to my investigation, the origin of the TextPosition object is the top left corner, whereas the origin of the Rectangle2D is the bottom left corner, and the direction of each of the Y-axis differs from each other.

First, I would like to confirm that my investigation is right. If so I would like to get some hint about how to make the coordinates of Rectangle2D and TextPosition into the same quadrant.

Thanks in advance

来源:https://stackoverflow.com/questions/38962072/pdfbox-2-0-2-how-to-combine-the-textposition-coordinates-and-graphics-generalp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!