Re-encoding only images of a PDF? (or, ghostscript fails on 8-bit RGB while optimizing)

妖精的绣舞 提交于 2019-12-31 06:23:08

问题


I need to optimize a number of big PDF documents for file size, so I tried using ghostscript, invoked like this:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dBATCH -sOutputFile=output-my-doc.pdf input-my-doc.pdf

I can see this running for some pages, but then on particular pages it crashes.

I updated to gs version 9.02, and I experience the same. After bursting the document into separate pages, and running the command above on each page, I could confirm which pages are problematic ones; in fact, the error occurs even if I call just gs input-my-doc-pageX.pdf - this starts a viewer, and I could see text typeset until it came to an image, when it crashed.

So I could confirm that in my case, gs crashes on specific images - and finally I can also provide a minimal working (or rather, non-working) example, which demonstrates the problem (below). In particular, the problem seems to be 8-bit RGB images, specified in a certain way.

 

Now, I cannot tell if this is a bug, but since I need to get this done - I was thinking that maybe I could "cheat" ghostscript, by running the PDFs through an application, which would pretty much leave the PDFs untouched - except that it would re-encode the images to a single format (say, PNG); so that the gs optimizer could run over these files too without crashing.

What options do I have to re-encode only the images of a given PDF using the command line in Linux?

Many thanks in advance for any answers,
Cheers!

 

PS: The test case is basically the source-code PDF example in the post: Imagemagick: generate raw image data for PDF flate embedding?.

That PDF (hello2.pdf) opens just fine in, say, evince:

... but since it's xref-table is corrupt, I repair it:

$ pdftk hello2.pdf output hello2O.pdf
$ qpdf --check hello2O.pdf 
checking hello2O.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
No errors found

The repaired file hello2O.pdf also opens fine in evince - however, when I try to run the above gs optimizing command on it, it fails:

$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dBATCH -sOutputFile=optihello2O.pdf hello2O.pdf
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Loading NimbusSanL-Regu font from /usr/share/ghostscript/9.02/Resource/Font/NimbusSanL-Regu... 2756020 1410650 1869284 568021 3 done.
Error: /undefined in --run--
Operand stack:
   --dict:6/15(L)--   false   --dict:11/19(L)--   --dict:4/4(L)--   --nostringval--   FlateDecode   --dict:4/4(L)--   0
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1910   1   3   %oparray_pop   1909   1   3   %oparray_pop   1893   1   3   %oparray_pop   --nostringval--   --nostringval--   2   1   1   --nostringval--   %for_pos_int_continue   --nostringval--   --nostringval--   --nostringval--   --nostringval--   %array_continue   --nostringval--   false   1   %stopped_push   --nostringval--   %loop_continue   --nostringval--   576   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--   %array_continue   --nostringval--   --nostringval--
Dictionary stack:
   --dict:1160/1684(ro)(G)--   --dict:1/20(G)--   --dict:82/200(L)--   --dict:82/200(L)--   --dict:108/127(ro)(G)--   --dict:295/300(ro)(G)--   --dict:23/30(L)--   --dict:6/8(L)--   --dict:25/40(L)--   --dict:7/17(L)--
Current allocation mode is local
GPL Ghostscript 9.02: Unrecoverable error, exit code 1

回答1:


First, if you find a Ghostscript bug, please report it to us as http://bugs.ghostscript.com

Secondly I suggest you update the current shipping version of 9.05 which probably has this bug fixed.



来源:https://stackoverflow.com/questions/10936142/re-encoding-only-images-of-a-pdf-or-ghostscript-fails-on-8-bit-rgb-while-opti

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!