Converting searchable PDF to a non-searchable PDF

后端 未结 3 1582
余生分开走
余生分开走 2020-12-15 01:49

I have a PDF which is searchable and I need to convert it into a non-searchable one.

I tried using Ghostscript and change it to JPEG and then back to PDF which does

相关标签:
3条回答
  • 2020-12-15 02:12

    I think converting to an image like jpg is the way to go, it might be worth converting to am image, optimizing/reducing the size of the images and then creating a PDF with those?

    0 讨论(0)
  • 2020-12-15 02:18

    a possible way to produce non-searchable vector pdf from a searchable vector pdf is

    1. burst pdf in its single pages

      pdftk file.pdf burst

    2. convert any single page in svg with

      pdftocairo

      • http://poppler.freedesktop.org/

    contained into poppler utils

    for f in *.pdf; do pdftocairo -svg $f; done
    

    3 . delete ALL pdf in folder

    4 . then, with batikrasterizer

    • http://xmlgraphics.apache.org/batik/tools/rasterizer.html

    re-convert ALL svg to pdf (this time the resulting pdfs will be kept vectorial, but without to be searchable)

    java -jar ./batik-rasterizer.jar -m application/pdf *.svg
    

    final step: join all resulting single page pd in one multipage pdf file

    pdftk *.pdf cat output out.pdf
    
    0 讨论(0)
  • 2020-12-15 02:19

    You can use Ghostscript to achieve that. You need 2 steps:

    1. Convert the PDF to a PostScript file, which has all used fonts converted to outline shapes. The key here is the -dNOCACHE paramenter:

      gs -o somepdf.ps -dNOCACHE -sDEVICE=pswrite somepdf.pdf

    2. Convert the PS back to PDF (and, maybe delete the intermediate PS again):

      gs -o somepdf-with-outlines.pdf -sDEVICE=pdfwrite somepdf.ps
      rm somepdf.ps

    Note, that the resulting PDF will very likely be larger than the original one. (And, without additional command line parameters, all images in the original PDF will likely also be converted according to Ghostscript builtin defaults, unless you add more command line parameters to do otherwise. But the quality should be better than your own attempt to use Ghostscript...)


    Update

    Apparently, from version 9.15 (to be released during September/October 2014), Ghostscript will support a new command line parameter:

     -dNoOutputFonts
    

    which will cause the output devices pdfwrite, ps2write and eps2write "to 'flatten' glyphs into 'basic' marking operations (rather than writing fonts to the output)".

    This means that the above two steps can be avoided, and the desired result be achieved with a single command:

     gs -o somepdf-with-outlines.pdf -dNoOutputFonts -sDEVICE=pdfwrite somepdf.pdf
    

    Caveats: I've tested this with a few input files using a self-compiled Ghostscript based on current Git sources. It worked flawlessly in each case.

    0 讨论(0)
提交回复
热议问题