How to extract image bytes out of PDF efficiently

问题

Is there a way to extract image bytes out of PDImageXObject for different image types without loading them into a BufferedImage? A 15mb TIFF file takes up 200mb in memory when loaded into BufferedImage, which I would love to avoid.

I have found an example for JPG files, but I have no idea what it's doing or if it's possible to do the equivalent for other file types: PNG, GIF, TIFF etc.

    // I don't really understand this, but it works for JPEGs
    private static final List<String> PDF_JPEG_STOP_FILTERS = Arrays.asList(
        COSName.DCT_DECODE.getName(),
        COSName.DCT_DECODE_ABBREVIATION.getName());

    public void extractImage(PDImageXObject pdImage, OutpuStream baos) {
        if ("jpg".equals(pdImage.getSuffix())) {
            try (InputStream is = pdImage.createInputStream(PDF_JPEG_STOP_FILTERS)) {
                IOUtils.copy(is, baos);
            }
        } else {
            BufferedImage image = pdImage.getImage();
            // image.raster.data is huge
            ImageIO.write(image, "jpg", baos);
        }
    }

来源：https://stackoverflow.com/questions/60107248/how-to-extract-image-bytes-out-of-pdf-efficiently

标签

java

image

image-processing

pdfbox

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!