Java PDFBox, extract data from a column of a table

扶醉桌前 提交于 2020-01-06 18:30:33

问题


I would like to find out how to extract from this pdf(ex. image) http://postimg.org/image/ypebht5dx/

For example, I want to extract only the values ​​in the column "TENSIONE[V]" and if it encounters a blank cell I enter the letter "X" in the output. How could I do?

The code I used is this:

 PDDocument p=PDDocument.load(new File("a.pdf"));
 PDFTextStripper t=new PDFTextStripper();
 System.out.println(t.getText(p));

and I get this output:

http://s23.postimg.org/wbhcrw03v/Immagine.png


回答1:


These are just guidelines. Use them upon your use. This is not tested either, but help you solve your issue. If you have any question let me know.

String text = t.getText(p);
String lines[] = text.split("\\r?\\n"); // give you all the lines separated by new line

String cols[] = lines[0].split("\\s+") // gives array separated by whitespaces
// cols[0] contains pins
// clos[1] contains TENSIONE[V]
// cols[2] contains TOLLRENZA if not present then its empty


来源:https://stackoverflow.com/questions/16217999/java-pdfbox-extract-data-from-a-column-of-a-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!