Advanced PDF parser for Java

后端 未结 5 607
轻奢々
轻奢々 2020-12-01 04:48

I want to extract different content from a PDF file in Java:

  • The complete visible text
  • images
  • links

Is it also possible to get

5条回答
  •  广开言路
    2020-12-01 05:11

    Yes Alp, iText does offer the functionality you mentioned.

    READING PDFS

    iText isn't a PDF viewer, iText can't convert PDF to an image, nor can iText be used to print a PDF, but the PdfReader class can give you access to the objects that form a PDF document and to the the content stream of each page. This content stream can be parsed and if the content wasn't added as rasterized text, you can convert a page to plain text. Note that iText doesn't do OCR.

    Use com.itextpdf.text.pdf.PdfReader; class.

提交回复
热议问题