how to reduce the size of png image in pdf (compress png in pdf)

只愿长相守 提交于 2020-06-17 13:19:07

问题


I want to reduce the size of pdf file by replacing the high-resolution image with a lower-resolution image. To complete the issue, I have to:

  1. extract the images(streams) from pdf
  2. compress the images
  3. replace the images(streams) in the pdf with compressed images

When I extract png images and replace them, the transparent background changes to a black background. I extract the images from the pdf to figure out the reason. There is something very strange that pdf uses to stream to save a png. So if I attempt to extract a png image from a pdf, I will get two different images: an 8-bit color image and a 24-bit color image.

...
1 0 obj
<</Type/XObject/Subtype/Image/Width 1920/Height 1035/Length 24720/ColorSpace/DeviceGray/BitsPerComponent 8/Filter/FlateDecode>>stream
...
endstream
endobj
2 0 obj
<</Type/XObject/Subtype/Image/Width 1920/Height 1035/SMask 1 0 R/Length 47751/ColorSpace[/CalRGB<</Gamma[2.2 2.2 2.2]/Matrix[0.41239 0.21264 0.01933 0.35758 0.71517 0.11919 0.18045 0.07218 0.9504]/WhitePoint[0.95043 1 1.09]>>]/Intent/Perceptual/BitsPerComponent 8/Filter/FlateDecode>>stream
...
endstream
...

Original image(32-bit color image with a transparent background):
original image

An 8-bit color image: 8-bit color

An 24-bit color image:
24-bit color

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>5.5.12</version>
</dependency>
<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.16</version>
</dependency>

ImageExtractor will help you to extract images from the Pdf file.

public class ImageExtractor {

    private static final Logger log = LoggerFactory.getLogger(ImageExtractor.class);

    public void extract(File pdf, File imageDir) throws IOException {
        if(!imageDir.exists()) {
            imageDir.mkdirs();
        }
        PDDocument document = PDDocument.load(pdf);
        PDPageTree list = document.getPages();
        System.out.println("PDPageTree#count: " + list.getCount());
        int pageIndex = 1;
        for (PDPage page : list) {
            PDResources pdResources = page.getResources();
            System.out.println(pdResources.toString());
            for (COSName c : pdResources.getXObjectNames()) {
                System.out.println("PDResources[" + pageIndex + "]#COSName: " + c.getName());
                PDXObject o = pdResources.getXObject(c);
                System.out.println("PDResources[" + pageIndex + "]#PDXObject: " + o.toString());
                // https://github.com/mkl-public/testarea-itext5/blob/master/src/test/java/mkl/testarea/itext5/extract/ImageExtraction.java
                if (o instanceof PDImageXObject) {
                    PDImageXObject img = (PDImageXObject) o;
                    File file = new File(imageDir, pageIndex + "-" + System.nanoTime() + "." + img.getSuffix());
                    ImageIO.write(((PDImageXObject)o).getImage(), img.getSuffix(), file);
                }
            }
            pageIndex ++;
        }
        log.info("Images have been extracted successfully! Check your images folder.");
    }
}

ReplaceHightResolutionImage is the code I use to reduce the size of pdf.

package io.gitlab.donespeak.tutorial.pdf.reducesize.itext;

import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PRStream;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfNumber;
import com.itextpdf.text.pdf.PdfObject;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;
import com.itextpdf.text.pdf.PdfStream;
import com.itextpdf.text.pdf.parser.PdfImageObject;
import io.gitlab.donespeak.tutorial.pdf.reducesize.imagecompress.ImageCompressor;
import io.gitlab.donespeak.tutorial.pdf.reducesize.imagecompress.SimpleCompress;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class ReplaceHightResolutionImage {

    private ImageCompressor compressor;
    private double quality;
    private double scale;

    public ReplaceHightResolutionImage(double quality, double scale) {
        this.compressor = new SimpleCompress();
        this.quality = quality;
        this.scale = scale;
    }

    public ReplaceHightResolutionImage(double quality, double scale, ImageCompressor compressor) {
        this.compressor = compressor;
        this.quality = quality;
        this.scale = scale;
    }

    public void replace(File pdf, File output) throws IOException, DocumentException {
        PdfReader reader = new PdfReader(new FileInputStream(pdf));
        int n = reader.getXrefSize();
        PdfObject object;
        PRStream stream;

        for (int i = 0; i < n; i++) {

            object = reader.getPdfObject(i);
            stream = findImageStream(object);
            if (stream == null) {
                continue;
            }
            PdfImageObject pdfImageObject = new PdfImageObject(stream);
            BufferedImage bi = pdfImageObject.getBufferedImage();
            if (bi == null) {
                continue;
            }
            System.out.println("PdfReader#Xref: " + i + "," + pdfImageObject.getFileType());
            BufferedImage resultImage = compressor.compress(bi, pdfImageObject.getFileType(), quality, scale);
            replaceImage(stream, resultImage);
        }

        PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(output));
        // furtherCompress(reader, stamper);
        stamper.close();
    }

    private void furtherCompress(PdfReader reader, PdfStamper stamper) throws DocumentException {
        reader.removeFields();
        reader.removeUnusedObjects();
        stamper.setFullCompression();
        stamper.getWriter().setCompressionLevel(PdfStream.DEFAULT_COMPRESSION);
    }

    private PRStream findImageStream(PdfObject object) {
        PRStream stream;
        if (object == null || !object.isStream()) {
            return null;
        }
        stream = (PRStream)object;
        System.out.println(stream.getAsName(PdfName.SUBTYPE));
        if (!PdfName.IMAGE.equals(stream.getAsName(PdfName.SUBTYPE))) {
            // not jpg or png
            return null;
        }
        PdfName pdfName = stream.getAsName(PdfName.FILTER);
        if (!PdfName.DCTDECODE.equals(pdfName) && !PdfName.FLATEDECODE.equals(pdfName)) {
            return null;
        }
        // if (PdfName.DCTDECODE.equals(filter)) {
        //     return PdfImageObject.ImageBytesType.JPG.getFileExtension();
        // } else if (PdfName.JPXDECODE.equals(filter)) {
        //     return PdfImageObject.ImageBytesType.JP2.getFileExtension();
        // } else if (PdfName.FLATEDECODE.equals(filter)) {
        //     return PdfImageObject.ImageBytesType.PNG.getFileExtension();
        // } else if (PdfName.LZWDECODE.equals(filter)) {
        //     return PdfImageObject.ImageBytesType.CCITT.getFileExtension();
        // }
        return stream;
    }

    private void replaceImage(PRStream stream, BufferedImage resultImage) throws IOException {

        ByteArrayOutputStream imgBytes = new ByteArrayOutputStream();
        ImageIO.write(resultImage, "JPG", imgBytes);

        stream.clear();
        stream.setData(imgBytes.toByteArray(), false, PRStream.NO_COMPRESSION);
        stream.put(PdfName.TYPE, PdfName.XOBJECT);
        stream.put(PdfName.SUBTYPE, PdfName.IMAGE);
        stream.put(PdfName.FILTER, PdfName.DCTDECODE);
        stream.put(PdfName.WIDTH, new PdfNumber(resultImage.getWidth()));
        stream.put(PdfName.HEIGHT, new PdfNumber(resultImage.getHeight()));
        stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
        stream.put(PdfName.COLORSPACE, PdfName.DEVICERGB);
    }
}
package io.gitlab.donespeak.tutorial.pdf.reducesize.itext;

public class ThumbnailatorCompressor implements ImageCompressor {

    @Override
    public BufferedImage compress(BufferedImage image, String imageFormat, double quality, double scale) throws IOException {
        System.out.println("ThumbnailatorCompressor#type: " + image.getType());
        // int imageType = "png".equalsIgnoreCase(imageFormat)? BufferedImage.TYPE_INT_ARGB: image.getType();
        BufferedImage thumbnail = Thumbnails.of(image)
            .imageType(image.getType())
            .scale(scale)
            .outputQuality(quality)
            // .outputFormat(imageFormat)
            .useOriginalFormat()
            .asBufferedImage();

        return thumbnail;
    }
}
  • horse.pdf
  • horse.png
public class ReplaceHightResolutionImageTest {

    @Test
    public void reduceWithThumbnailatorCompressor() throws IOException, DocumentException {
        double quality = 1d;
        double scale = 0.6d;
        File pdf = new File("pdf/asset/horse.pdf");
        File output = new File("pdf/target/output", "replaced-" + quality + "-" + scale);
        ReplaceHightResolutionImage replacer = new ReplaceHightResolutionImage(quality, scale, new SimpleCompress());
        replacer.replace(pdf, output);
    }
}

回答1:


Here is a workable but not good enough answer. It compresses jpg and png very well. The only shortcoming is that if you reuse an image in many pages, it will take each image ref as an individual stream and produces a new stream to take place of the image ref which might cause a larger file size.

1 0 obj
<</Type/XObject/Subtype/Image/Width 1002/Height 564/Filter/DCTDecode/ColorSpace/DeviceRGB/BitsPerComponent 8/Length 89149>>stream
...
endstream
endobj
2 0 obj
<</Length 106/Filter/FlateDecode>>stream
x�m�=� ��w�^@|���=� 7�/����8�6��&b0$��
��N!o��L�,?Ck'�����c�h�x0��/(5c*�Y�سEX�o�Uj3�B�ݔ"
endstream
endobj
4 0 obj
<</Type/Page/MediaBox[0 0 595 842]/Resources<</XObject<</img0 1 0 R>>>>/Contents 2 0 R/Parent 3 0 R>>
endobj
5 0 obj
<</Length 106/Filter/FlateDecode>>stream
x�m�=� ��w�^@|���=�image    7�/����8�6��&b0$��
��N!o��L�,?Ck'�����c�h�x0��/(5c*�Y�سEX�o�Uj3�B�ݔ"
endstream
endobj
6 0 obj
<</Type/Page/MediaBox[0 0 595 842]/Resources<</XObject<</img0 1 0 R>>>>/Contents 5 0 R/Parent 3 0 R>>
endobj
package io.gitlab.donespeak.tutorial.pdf.reducesize;

import io.gitlab.donespeak.tutorial.pdf.reducesize.imagecompress.ThumbnailatorCompressor;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory;
import org.apache.pdfbox.pdmodel.graphics.image.LosslessFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

public class RemoveAllImageFromPdf {

    public static void extractImages(File input, File imageDir) throws IOException {
        if(imageDir.exists()) {
            imageDir.delete();
        }
        imageDir.mkdirs();
        PDDocument document = PDDocument.load(input);
        int pageIndex = 1;
        PDDocumentCatalog catalog = document.getDocumentCatalog();
        for (PDPage page : catalog.getPages()) {
            PDResources pdResources = page.getResources();
            System.out.println(pdResources.toString());
            for (COSName c : pdResources.getXObjectNames()) {
                System.out.println("PDResources[" + pageIndex + "]#COSName: " + c.getName());
                PDXObject o = pdResources.getXObject(c);
                System.out.println("PDResources[" + pageIndex + "]#PDXObject: " + o.toString());
                // https://github.com/mkl-public/testarea-itext5/blob/master/src/test/java/mkl/testarea/itext5/extract/ImageExtraction.java
                if (o instanceof PDImageXObject) {
                    PDImageXObject img = (PDImageXObject) o;
                    System.out.println(img.getSuffix() + "-" + img.getBitsPerComponent() + "-" + img.getColorSpace());
                    File file = new File(imageDir, pageIndex + "-" + c.getName() + "-" + img.getColorSpace() + "-" + System.nanoTime() + "." + img.getSuffix());
                    ImageIO.write(((PDImageXObject)o).getImage(), img.getSuffix(), file);
                }
            }
            pageIndex ++;
        }
        // document.save(output);
    }

    /**
     *
     * @param input
     * @param output
     * @throws IOException
     */
    public static void compress(File input, File output) throws IOException {
        if(!output.getParentFile().exists()) {
            output.getParentFile().mkdirs();
        }
        ThumbnailatorCompressor compressor = new ThumbnailatorCompressor();
        PDDocument document = PDDocument.load(input);
        int pageIndex = 1;
        PDDocumentCatalog catalog = document.getDocumentCatalog();

        for (PDPage page : catalog.getPages()) {
            PDResources pdResources = page.getResources();
            for (COSName c : pdResources.getXObjectNames()) {
                System.out.println("PDResources[" + pageIndex + "]#COSName: " + c.getName());
                PDXObject o = pdResources.getXObject(c);
                System.out.println("PDResources[" + pageIndex + "]#PDXObject: " + o.toString());
                // https://github.com/mkl-public/testarea-itext5/blob/master/src/test/java/mkl/testarea/itext5/extract/ImageExtraction.java
                if (o instanceof PDImageXObject) {
                    PDImageXObject img = (PDImageXObject) o;
                    BufferedImage bufferedImage = compressor.compress(img.getImage(), img.getSuffix(), 0.8, 0.5);
                    PDImageXObject imgNew = null;
                    System.out.println("img(w, h): (" + img.getWidth() + "," + img.getHeight() + ")");
                    System.out.println("bufferedImage(w, h): (" + bufferedImage.getWidth() + "," + bufferedImage.getHeight() + ")");
                    if("png".equalsIgnoreCase(img.getSuffix())) {
                        imgNew = LosslessFactory.createFromImage(document, bufferedImage);
                    } else {
                        imgNew = JPEGFactory.createFromImage(document, bufferedImage);
                    }
                    pdResources.put(c, imgNew);
                }
            }
            pageIndex ++;
        }
        if(!output.getParentFile().exists()) {
            output.getParentFile().mkdirs();
        }
        document.save(output);
        document.close();
    }
}

By using the following methods to process objects in the document directly, maybe we can solve the problem above. But I have no idea how to replace the stream in this way.

new com.itextpdf.text.pdf.PdfReader(new FileInputStream(pdf)).getPdfObject(i);
// or
org.apache.pdfbox.pdmodel.PDDocument.load(pdf).getDocument().getObjects()


来源:https://stackoverflow.com/questions/61590198/how-to-reduce-the-size-of-png-image-in-pdf-compress-png-in-pdf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!