Create a tiff with only text and no images from a postscript file with ghostscript

喜你入骨 提交于 2019-12-31 04:58:06

问题


Is it possible to create a tiff file from a postscript-file (created from a pdf-document with readable text and images) into a tiff file without the images and only the text?

Like add a maxbuffer so images will be removed and only text remaining?

And if boxes and lines around text could be removed as well that would be awesome.

Best regards!


回答1:


You can redefine the various 'image' operators so that they don't do anything:

/image {
 type /dicttype eq not { % uses up argument, only one if dict form
   pop pop pop pop   % remove the arguments for the non-dictionary form.
 } ifelse
} bind def

/imagemask {
 type /dicttype eq not { % uses up argument, only one if dict form
   pop pop pop pop   % remove the arguments for the non-dictionary form.
 } ifelse
} bind def

/colorimage {
  type /integertype eq {
    pop                  % multi
    0 1 3 -1 roll {pop} for % one for each colour component
  } {
    pop pop pop
  } ifelse
} bind def

Save that as a file, and add the file to your GS invocation.

You can remove linework similarly by redefining the stroke operator:

/stroke {
  newpath
} bind def

rectstroke is harder, I suggest you read the PLRM if you need that one.

Possibly also the fill operator:

/fill {
  newpath
} bind def

/eofill {
  newpath
} bind def

Beware! Some text is not drawn using the text 'show' operators, but is constructed from linework, or drawn as images. These techniques will be defeated if you redefine the operators as shown above.

Note that the PDF interpreter often doesn't allow re-definition of operators, so you may first have to convert your PDF file to PostScript, using the ps2write device, then run the resulting file through GS to get a TIFF file.




回答2:


gs -sDEVICE=bitrgbtags -o out.tags <myfile>

will create a ppm file with tags - tags label each pixel as text, vector, image etc.

Then you can use the C programs in ghostpdl/tools/GOT to process the image. It sounds like you want to write a new C program to to set each non text pixel to the background color or maybe just white, this is fairly straightforward with the example C programs in the GOT subdirectory as a guide (if you are a programmer). Then you would convert the ppm to tiff. Ken provided a different way of doing this that doesn't require pixel processing.



来源:https://stackoverflow.com/questions/6437564/create-a-tiff-with-only-text-and-no-images-from-a-postscript-file-with-ghostscri

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!