Removing text from PDF

血红的双手。 提交于 2019-11-29 17:00:38
Bruno Lowagie
  1. The /Contents of a page dictionary doesn't always consist of an array. It should be evident that GetAsArray() returns null if the content is stored as a stream.
  2. Suppose you use GetAsStream() and you remove all the text contents from the stream, then you may still have text content in XObjects. That text won't be referenced from a content stream, but iText won't be able to remove the XObjects as 'unused objects' because the objects will still be referenced from the /Resources in the page dictionary.

Please read ISO-32000-1 to find out what you're doing wrong.

Now that you've updated your question, and revealed the motivation of the intended measure, let me tell you the truth:

  • These measures will in no way reduce the size of PDFs.

  • Instead they'll lead to a hugely increased file:

    1. First removing text + fonts may lead to a slight shrinking of the size, yes.

    2. Then converting the remains of the page to a bitmap will certainly increase the size hugely (or you agree with very low image quality, maybe?).

    3. At last 'pasting' text over it again will increase the file size again (very likely by the same amount you saved in the first step).

It's not a good plan at all.

If you provide (a link to) one of your typical sample PDF file I can probably come up with a Ghostscript (plus other tools) command line that works out of the box and shrinks the PDF size more efficiently.

To remove all text in a PDF, the easiest solution is using ghostcript

gs -o output_no_text.pdf -sDEVICE=pdfwrite -dFILTERTEXT  input.pdf
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!