Extract footer data of PDF in java

自作多情 提交于 2019-12-25 01:44:02

问题


I am able to get data from pdf pages in a string. But along with those, footer data is also extracted. I want to remove those from all the pages of pdf. How can I remove that I used Rectangle2D but coordinates are not giving data


回答1:


In a comment the OP indicated that he used this code:

PDDocument doc = PDDocument.load("xyz.pdf");
PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get( 1 );
Rectangle2D region = new Rectangle2D.Double(10, 10, 10, 10);
String regionName = "region";
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.addRegion(regionName, region);
stripper.extractRegions(page);
System.out.println("Region is "+ stripper.getTextForRegion("region"));

For most documents this code will extract no text because it looks at a small (10x10 pt) region in the upper left region of the second document page. Thus, the values in new Rectangle2D.Double(10, 10, 10, 10) have to change.

I tried with various regions , yet I am not getting any text, If you have idea for a normal pdf page , you should share

There is nothing like a normal pdf page. The goal of PDF is to enable users to exchange and view electronic documents easily and reliably, independent of the environment in which they were created or the environment in which they are viewed or printed. There is no serious restriction on page dimensions or location of content on pages.

E.g. for this form

you need values like these

PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get(0);
Rectangle2D region = new Rectangle2D.Float(0f, 230f, 612f, 300f);

to extract the body "I authorize any health plan ... I have received a copy of this authorization." without headers, footers, or form lines.

If you have many similar pages (e.g. one large document with many pages with a similarly layout), you have to measure but once for many pages to extract.



来源:https://stackoverflow.com/questions/26143942/extract-footer-data-of-pdf-in-java

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!