extract images from pdf using pdfbox

前端 未结 8 1979
刺人心
刺人心 2020-11-28 09:22

I m trying to extract images from a pdf using pdfbox. The example pdf here

But i m getting blank images only.

The code i m trying:-

public st         


        
8条回答
  •  栀梦
    栀梦 (楼主)
    2020-11-28 09:55

    Here is code using PDFBox 2.0.1 that will get a list of all images from the PDF. This is different than the other code in that it will recurse through the document instead of trying to get the images from the top level.

    public List getImagesFromPDF(PDDocument document) throws IOException {
            List images = new ArrayList<>();
        for (PDPage page : document.getPages()) {
            images.addAll(getImagesFromResources(page.getResources()));
        }
    
        return images;
    }
    
    private List getImagesFromResources(PDResources resources) throws IOException {
        List images = new ArrayList<>();
    
        for (COSName xObjectName : resources.getXObjectNames()) {
            PDXObject xObject = resources.getXObject(xObjectName);
    
            if (xObject instanceof PDFormXObject) {
                images.addAll(getImagesFromResources(((PDFormXObject) xObject).getResources()));
            } else if (xObject instanceof PDImageXObject) {
                images.add(((PDImageXObject) xObject).getImage());
            }
        }
    
        return images;
    }
    

提交回复
热议问题