In PDFBox, why does file size becomes extremely large after saving?

女生的网名这么多〃 提交于 2019-12-01 03:45:27

问题


Question

I am using PDFBox 1.8.8 to manipulate existing PDF files. After saving a document, the output file becomes several times larger than the original. This is undesirable.

How can I reduce the file size of output files?

How to replicate my situation

In the following code, PDFBox simply loads an existing PDF and then save it. Nothing else is done. Yet the file size still becomes several times larger.

Below are links to two sample input files. For input1.pdf, file size increases from 6MB to 50MB. For input2.pdf, file size increases from 0.4MB to 1.3MB.

https://dl.dropboxusercontent.com/u/13566649/samplePDF/input1.pdf https://dl.dropboxusercontent.com/u/13566649/samplePDF/input2.pdf

import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.exceptions.*;


class Test {

    public static void main(String[] args) throws IOException, COSVisitorException {

        PDDocument document = PDDocument.load("input1.pdf");
        document.save("output.pdf");
        document.close();       
    }
}   

What I have tried

I have tried using addCompression() method of PDStream class, as in the following code. It does not change anything. Output file size is still the same.

class Test2 {

    public static void main(String[] args) throws IOException, COSVisitorException {

        PDDocument document = PDDocument.load("input1.pdf");

        for (int i = 0; i < document.getNumberOfPages(); i++) {
            PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(i);
            page.getContents().addCompression();
        }

        document.save("output.pdf");
        document.close();    

    }

}   

回答1:


I wrote this strange code and it works for me (Apache PDFBox v.2.0.8):

private void saveCompressedPDF(PDDocument srcDoc, OutputStream os) throws IOException {
    PDDocument outDoc = new PDDocument();
    outDoc.setDocumentInformation(srcDoc.getDocumentInformation());
    for (PDPage srcPage : srcDoc.getPages()) {
        new PDPageContentStream(outDoc, srcPage,
                PDPageContentStream.AppendMode.APPEND, true).close();
        outDoc.addPage(srcPage);
    }
    outDoc.save(os);
    outDoc.close();
}


来源:https://stackoverflow.com/questions/28624660/in-pdfbox-why-does-file-size-becomes-extremely-large-after-saving

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!