How can I remove all images from a PDF?

前端 未结 2 1339
感动是毒
感动是毒 2020-11-29 05:37

I want to remove all images from a PDF file.

The page layouts should not change. All images should be replaced by empty space.

  • How can this be
相关标签:
2条回答
  • 2020-11-29 06:17

    Meanwhile the latest Ghostscript releases have a much nicer and easier to use method of removing all images from a PDF. The parameter to add to the command line is -dFILTERIMAGE

     gs -o noimages.pdf -sDEVICE=pdfwrite -dFILTERIMAGE input.pdf
    

    Even better, you can also remove all text or all vector drawing elements from a PDF by specifying -dFILTERTEXT or -dFILTERVECTOR.

    Of course, you can also combine any combination of these -dFILTER* parameters you want in order to achieve a required result. (Combining all three will of course result in "empty" pages.)

    Here is the screenshot from an example PDF page which contains all 3 types of content mentioned above:


    Screenshot of original PDF page containing "image", "vector" and "text" elements.


    Running the following 6 commands will create all 6 possible variations of remaining contents:

     gs -o noIMG.pdf   -sDEVICE=pdfwrite -dFILTERIMAGE                input.pdf
     gs -o noTXT.pdf   -sDEVICE=pdfwrite -dFILTERTEXT                 input.pdf
     gs -o noVCT.pdf   -sDEVICE=pdfwrite -dFILTERVECTOR               input.pdf
    
     gs -o onlyTXT.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERIMAGE input.pdf 
     gs -o onlyIMG.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERTEXT  input.pdf
     gs -o onlyVCT.pdf -sDEVICE=pdfwrite -dFILTERIMAGE  -dFILTERTEXT  input.pdf
    

    The following image illustrates the results:


    Top row, from left: all "text" removed; all "images" removed; all "vectors" removed. Bottom row, from left: only "text" kept; only "images" kept; only "vectors" kept.


    0 讨论(0)
  • 2020-11-29 06:19

    I'm putting up the answer myself, but the actual code is by courtesy of Chris Liddell, Ghostscript developer.

    I used his original PostScript code and stripped off its other functions. Only the function which removes raster images remains. Other graphical page objects -- text sections, patterns and vector objects -- should remain untouched.

    Copy the following code and save it as remove-images.ps:

    %!PS
    
    % Run as:
    %
    %      gs ..... -dFILTERIMAGE -dDELAYBIND -dWRITESYSTEMDICT \
    %                 ..... remove-images.ps <your-input-file>
    %
    % derived from Chris Liddell's original 'filter-obs.ps' script
    % Adapted by @pdfkungfoo (on Twitter)
    
    currentglobal true setglobal
    
    32 dict begin
    
    /debugprint     { systemdict /DUMPDEBUG .knownget { {print flush} if} 
                    {pop} ifelse } bind def
    
    /pushnulldevice {
      systemdict exch .knownget not
      {
        //false
      } if
    
      {
        gsave
        matrix currentmatrix
        nulldevice
        setmatrix
      } if
    } bind def
    
    /popnulldevice {
      systemdict exch .knownget not
      {
        //false
      } if
      {
        % this is hacky - some operators clear the current point
        % i.e.
        { currentpoint } stopped
        { grestore }
        { grestore moveto} ifelse
      } if
    } bind def
    
    /sgd {systemdict exch get def} bind def
    
    systemdict begin
    
    /_image /image sgd
    /_imagemask /imagemask sgd
    /_colorimage /colorimage sgd
    
    /image {
       (\nIMAGE\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec
      _image
      /FILTERIMAGE //popnulldevice exec
    } bind def
    
    /imagemask
    {
      (\nIMAGEMASK\n) //debugprint exec
      /FILTERIMAGE //pushnulldevice exec
      _imagemask
      /FILTERIMAGE //popnulldevice exec
    } bind def
    
    /colorimage
    {
      (\nCOLORIMAGE\n) //debugprint exec
      /FILTERIMAGE //pushnulldevice exec
      _colorimage
      /FILTERIMAGE //popnulldevice exec
    } bind def
    
    end
    end
    
    .bindnow
    
    setglobal
    

    Now run this command:

    gs -o no-more-images-in-sample.pdf \
       -sDEVICE=pdfwrite               \
       -dFILTERIMAGE                   \
       -dDELAYBIND                     \
       -dWRITESYSTEMDICT               \
        remove-images.ps               \
        sample.pdf
    

    I tested the code with the official PDF specification, and it worked. The following two screenshots show page 750 of input and output PDFs:

    If you wonder why something that looks like an image is still on the output page: it is not really a raster image, but a 'pattern' in the original file, and therefor it is not removed.

    0 讨论(0)
提交回复
热议问题