Reading a table or cell value in a pdf file using java?

前端 未结 2 847
难免孤独
难免孤独 2021-01-23 18:45

I have gone through Java and PDF forums to extract a text value from the table in a pdf file, but could\'t find any solution except JPedal (It\'s not opensource and licensed).

2条回答
  •  日久生厌
    2021-01-23 19:24

    Try PDFTextStream. At least I am able to identify the column values. Earlier, I was using iText and got stuck in defining strategy. Its hard.

    This api separates column cells by putting more spaces. Its fixed. you can put logic. (this was missing in iText).

    import com.snowtide.PDF;
    import com.snowtide.pdf.Document;
    import com.snowtide.pdf.OutputTarget;
    
    public class PDFText {
        public static void main(String[] args) throws java.io.IOException {
            String pdfFilePath = "xyz.pdf";
    
            Document pdf = PDF.open(pdfFilePath);
            StringBuilder text = new StringBuilder(1024);
            pdf.pipe(new OutputTarget(text));
            pdf.close();
            System.out.println(text);
       }
    }
    

    Question has been asked related to this on stackoverflow!

提交回复
热议问题