Save tiff CCITTFaxDecode (from PDF page) using iText and Java

佐手、 提交于 2019-11-27 18:38:49

问题


I'm using iText to extract embedded images and save them as separate files. The .jpg and .png files come out ok, but I cannot extract tiff images that have the CCITTFaxDecode encoding.

Does anyone have a way of saving the tiff files?

I found some sample C# code that uses iTextSharp at Extracting image from PDF with /CCITTFaxDecode filter It indicates a separate tiff library is needed to write out the results. According to that article, the "CCITTFaxDecode" compression is Compression.CCITTFAX4 for the tiff library.

To use that article's method, I need: 1. get a tiff library. The Java Image I/O API will allow you to read and write TIFF files among other formats. BufferedImage image = ImageIO.read( new File( "image.tif" ) );

  1. Find out the equivalent of the code for getting the bitmap's property from the PDF, example: pd.Get(PdfName.WIDTH).ToString() (which is in C#)

回答1:


I extracted a tiff image from scanned pdf (that is the every page as image) in the following way:

...
PdfReader reader = new PdfReader("source.pdf");
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener("destination.jpg");
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
parser.processContent(i, listener);
 }
...

code of MyImageRenderListener.class:

class MyImageRenderListener implements RenderListener {
    protected String path = "";

    public MyImageRenderListener(String path) {
        this.path = path;
    }

    public void beginTextBlock() {
    }

    public void endTextBlock() {
    }

    public void renderImage(ImageRenderInfo renderInfo) {
        try {
            String filename;
            FileOutputStream os;
            PdfImageObject image = renderInfo.getImage();
            PdfName filter = (PdfName) image.get(PdfName.FILTER);

                   if (PdfName.CCITTFAXDECODE.equals(filter)) {
                      BufferedImage bufferedImage = image.getBufferedImage();
                  ImageIO.write(bufferedImage, "jpg", new FileOutputStream(filename));// save tif image as jpg


            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public void renderText(TextRenderInfo renderInfo) {
    }
}


来源:https://stackoverflow.com/questions/6851385/save-tiff-ccittfaxdecode-from-pdf-page-using-itext-and-java

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!