I was going through the itext api docs & I was able create a pdf with a watermark image or text but did not find a method to get/extract watermark content from pdf.
How to extract watermark content using iText apis? Or is there any other way to validate watermark content?
There is nothing special about watermarks in PDFs in contrast to regular page content. They merely
appear pretty early in the content stream and other content later in the stream, therefore, is drawn above it; or they
appear pretty late in the content stream but have some kind of transparency applied.
Actually there is another type of watermarks which is special, the so-called Watermark Annotations. As these annotation can easily be lost when documents are merged or otherwise manipulated, though, they hardly ever are used.
Furthermore different PDF generating software suites offering a way to add watermarks do so in their respective individual way. Thus, you cannot even recognize watermarks by some special operations done in some specific unique pattern.
Already the iText examples you referred to apply different kinds of watermarks
MovieCountries2
simply draws some gray large Text using an angled base line.StampStationery
copies a complete page from some PDF (which itself may visually have foreground and background material) into a separate object inside the target PDF and adds a reference to this object at the beginning of every page of the target.InsertPages
similarly references a page from some PDF on every newly generated target document page.Thus, blind watermark extraction is virtually impossible.
You might try some validation, though, if you know what you are searching for. You simply do not merely search some (in PDF not existing) fixed watermark stream but instead the whole page content.
iText offers the classes of the parser
package which allow extraction of text and/or bitmap images from content streams. Look at the samples referenced from the keywords PARSING PDF > EXTRACTING IMAGES and PARSING PDF > EXTRACTING TEXT.
You merely have to check whether the image or text which you expect can be found by these classes positioned and styled as you expect.