How can I remove all images from a PDF?

前端未结

关注

 2  1352

感动是毒

I want to remove all images from a PDF file.

The page layouts should not change. All images should be replaced by empty space.

How can this be

相关标签:

2条回答

无人及你

2020-11-29 06:17

Meanwhile the latest Ghostscript releases have a much nicer and easier to use method of removing all images from a PDF. The parameter to add to the command line is -dFILTERIMAGE

gs -o noimages.pdf -sDEVICE=pdfwrite -dFILTERIMAGE input.pdf

Even better, you can also remove all text or all vector drawing elements from a PDF by specifying -dFILTERTEXT or -dFILTERVECTOR.

Of course, you can also combine any combination of these -dFILTER* parameters you want in order to achieve a required result. (Combining all three will of course result in "empty" pages.)

Here is the screenshot from an example PDF page which contains all 3 types of content mentioned above:

_{Screenshot of original PDF page containing "image", "vector" and "text" elements.}

Running the following 6 commands will create all 6 possible variations of remaining contents:

gs -o noIMG.pdf -sDEVICE=pdfwrite -dFILTERIMAGE input.pdf gs -o noTXT.pdf -sDEVICE=pdfwrite -dFILTERTEXT input.pdf gs -o noVCT.pdf -sDEVICE=pdfwrite -dFILTERVECTOR input.pdf gs -o onlyTXT.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERIMAGE input.pdf gs -o onlyIMG.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERTEXT input.pdf gs -o onlyVCT.pdf -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERTEXT input.pdf

The following image illustrates the results:

_{Top row, from left: all "text" removed; all "images" removed; all "vectors" removed. Bottom row, from left: only "text" kept; only "images" kept; only "vectors" kept.}

0 讨论(0)

发布评论:

提交评论

加载中...

南方客

2020-11-29 06:19

I'm putting up the answer myself, but the actual code is by courtesy of Chris Liddell, Ghostscript developer.

I used his original PostScript code and stripped off its other functions. Only the function which removes raster images remains. Other graphical page objects -- text sections, patterns and vector objects -- should remain untouched.

Copy the following code and save it as remove-images.ps:

%!PS % Run as: % % gs ..... -dFILTERIMAGE -dDELAYBIND -dWRITESYSTEMDICT \ % ..... remove-images.ps <your-input-file> % % derived from Chris Liddell's original 'filter-obs.ps' script % Adapted by @pdfkungfoo (on Twitter) currentglobal true setglobal 32 dict begin /debugprint { systemdict /DUMPDEBUG .knownget { {print flush} if} {pop} ifelse } bind def /pushnulldevice { systemdict exch .knownget not { //false } if { gsave matrix currentmatrix nulldevice setmatrix } if } bind def /popnulldevice { systemdict exch .knownget not { //false } if { % this is hacky - some operators clear the current point % i.e. { currentpoint } stopped { grestore } { grestore moveto} ifelse } if } bind def /sgd {systemdict exch get def} bind def systemdict begin /_image /image sgd /_imagemask /imagemask sgd /_colorimage /colorimage sgd /image { (\nIMAGE\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec _image /FILTERIMAGE //popnulldevice exec } bind def /imagemask { (\nIMAGEMASK\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec _imagemask /FILTERIMAGE //popnulldevice exec } bind def /colorimage { (\nCOLORIMAGE\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec _colorimage /FILTERIMAGE //popnulldevice exec } bind def end end .bindnow setglobal

Now run this command:

gs -o no-more-images-in-sample.pdf \ -sDEVICE=pdfwrite \ -dFILTERIMAGE \ -dDELAYBIND \ -dWRITESYSTEMDICT \ remove-images.ps \ sample.pdf

I tested the code with the official PDF specification, and it worked. The following two screenshots show page 750 of input and output PDFs:

If you wonder why something that looks like an image is still on the output page: it is not really a raster image, but a 'pattern' in the original file, and therefor it is not removed.

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复