how to extract PDF watermark content using iText apis

前端 未结 1 659
你的背包
你的背包 2020-12-22 00:02

I was going through the itext api docs & I was able create a pdf with a watermark image or text but did not find a method to get/extract watermark content from pdf.

相关标签:
1条回答
  • 2020-12-22 00:42

    How to extract watermark content using iText apis? Or is there any other way to validate watermark content?

    Extracting watermark content?

    There is nothing special about watermarks in PDFs in contrast to regular page content. They merely

    • appear pretty early in the content stream and other content later in the stream, therefore, is drawn above it; or they

    • appear pretty late in the content stream but have some kind of transparency applied.

    Actually there is another type of watermarks which is special, the so-called Watermark Annotations. As these annotation can easily be lost when documents are merged or otherwise manipulated, though, they hardly ever are used.

    Furthermore different PDF generating software suites offering a way to add watermarks do so in their respective individual way. Thus, you cannot even recognize watermarks by some special operations done in some specific unique pattern.

    Already the iText examples you referred to apply different kinds of watermarks

    • MovieCountries2 simply draws some gray large Text using an angled base line.
    • StampStationery copies a complete page from some PDF (which itself may visually have foreground and background material) into a separate object inside the target PDF and adds a reference to this object at the beginning of every page of the target.
    • InsertPages similarly references a page from some PDF on every newly generated target document page.

    Thus, blind watermark extraction is virtually impossible.

    Validating watermark content!

    You might try some validation, though, if you know what you are searching for. You simply do not merely search some (in PDF not existing) fixed watermark stream but instead the whole page content.

    iText offers the classes of the parser package which allow extraction of text and/or bitmap images from content streams. Look at the samples referenced from the keywords PARSING PDF > EXTRACTING IMAGES and PARSING PDF > EXTRACTING TEXT.

    You merely have to check whether the image or text which you expect can be found by these classes positioned and styled as you expect.

    0 讨论(0)
提交回复
热议问题