Reading a table or cell value in a pdf file using java?

前端未结

关注

 2  847

难免孤独 2021-01-23 18:45

I have gone through Java and PDF forums to extract a text value from the table in a pdf file, but could\'t find any solution except JPedal (It\'s not opensource and licensed).

2条回答

日久生厌 (楼主)

2021-01-23 19:24

Try PDFTextStream. At least I am able to identify the column values. Earlier, I was using iText and got stuck in defining strategy. Its hard.

This api separates column cells by putting more spaces. Its fixed. you can put logic. (this was missing in iText).

import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.OutputTarget;

public class PDFText {
    public static void main(String[] args) throws java.io.IOException {
        String pdfFilePath = "xyz.pdf";

        Document pdf = PDF.open(pdfFilePath);
        StringBuilder text = new StringBuilder(1024);
        pdf.pipe(new OutputTarget(text));
        pdf.close();
        System.out.println(text);
   }
}

Question has been asked related to this on stackoverflow!

0 讨论(0)

查看其它2个回答