Advanced PDF parser for Java

后端 未结 5 603
轻奢々
轻奢々 2020-12-01 04:48

I want to extract different content from a PDF file in Java:

  • The complete visible text
  • images
  • links

Is it also possible to get

5条回答
  •  星月不相逢
    2020-12-01 05:29

    Most of this you can do with our PDF Library extended edition as well.

    Whichever solution you go for, bear in mind that for certain PDF documents, text extraction is impossible due to the way the PDF is constructed (the glyphs on the page sometimes don't have any semantic meaning associated with them).

    The quick way to check this is to open the document in Acrobat and try copying/pasting the text. If it comes up as gibberish there, chances are it will come up as gibberish in any other PDF extractor.

提交回复
热议问题