How to extract images from pdf using Java (not using pdfbox)

血红的双手。 提交于 2019-12-12 08:09:23

问题


I've being researching on how to extract images from a big (> 300MB) PDF file. I'm using pdfbox but for some particular reason that I can't figure out, some pages are not correctly extracted.

I'm using the PDFToImage class of pdfbox as base for my code.

So, do you know another library that may help me to do this? I know that iText may be used, but I read that it can't be used for commercial products.

I've installed the packages xpdf and xpdf-utils, and the utility called pdfimages is working perfect. But I need to solve this problem from Java and it should be portable.


回答1:


I think you're talking about two different things here: extracting images from a PDF, and converting PDF pages to images. PDFToImage will output an image for every page, while pdfimages extracts all embedded images (e.g. a text document has 0 images).

Take a look at org.apache.pdfbox.ExtractImages to see if it does what you want.




回答2:


The most likely reason why it is hard working with 300 Mb PDF's is that you run out of memory. If it works well for smaller PDF's I would have a closer look at why it fails.




回答3:


Have you tried icepdf or JPedal (both pure java)?



来源:https://stackoverflow.com/questions/4315836/how-to-extract-images-from-pdf-using-java-not-using-pdfbox

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!