Highlight words in a pdf using itextsharp, not displaying highlighted word in browser

前端 未结 2 1574
星月不相逢
星月不相逢 2020-11-30 13:19

Highlighted words are not displaying in browser using itextsharp.

Adobe

Browser

CODE

相关标签:
2条回答
  • 2020-11-30 13:52

    You are using a Markup annotation to highlight text. That's great! There's nothing wrong with your code, nor with iText. However: not all PDF viewers support that functionality.

    If you want to see highlighted text in every PDF viewer, a (sub-optimal) workaround could be to add a yellow rectangle to the content stream under the existing content (assuming that the existing content isn't opaque).

    This is demonstrated in the HighLightByAddingContent example:

    public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
        PdfReader reader = new PdfReader(src);
        PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
        PdfContentByte canvas = stamper.getUnderContent(1);
        canvas.saveState();
        canvas.setColorFill(BaseColor.YELLOW);
        canvas.rectangle(36, 786, 66, 16);
        canvas.fill();
        canvas.restoreState();
        stamper.close();
        reader.close();
    }
    

    In this example, we take a file named hello.pdf and we add a yellow rectangle, with the file hello_highlighted.pdf as result.

    Note that you won't see the yellow rectangle if you add it under an opaque shape (e.g. under an image). In that case, you could add a transparent rectangle on top of the existing content.

    Update: my example was written in Java. It shouldn't be a problem for a developer to port this to C#. It's only a matter of changing some lower-cases into upper-cases. E.g. stamper.GetUnderContent(1) instead of stamper.getUnderContent(1), canvas.SaveState() instead of canvas.saveState(), and so on.

    0 讨论(0)
  • 2020-11-30 13:54

    First of all...

    Why does the OP's (updated) code not work

    There actually are two factors.

    First of all, there is an issue in the OP's code, to add a rectangle to a path he uses

    canvas.Rectangle(rect);
    

    Unfortunately this does not what he expects: The Rectangle class has multiple properties beyond the mere coordinates of a rectangle, foremost information about selected borders, border colors, and an interior color, and PdfContentByte.Rectangle(Rectangle) draws a rectangle according to those properties.

    In the case at hand, though, rect is used only to transport the coordinates of a rectangle, so those additional properties all are false or null. Thus, canvas.Rectangle(rect) does nothing!

    Instead the OP should use

    canvas.Rectangle(rect.Left, rect.Bottom, rect.Width, rect.Height);
    

    here.

    Furthermore, @Bruno mentioned in his answer

    Note that you won't see the yellow rectangle if you add it under an opaque shape (e.g. under an image).

    Unfortunately exactly this is the case here: The document actually is a scanned document, each page been a page-filling image under which the equivalent text is drawn (probably after OCR'ing) to allow textual copy&paste.

    Thus, whatever the OP's code may draw on the UnderContent, it will be hidden by that very image.

    Thus, let's try something different...

    How to make it work

    @Bruno in his answer also indicated a solution for such a case:

    In that case, you could add a transparent rectangle on top of the existing content.

    Following this advice we replace

    canvas = stamper.GetUnderContent(pageno);
    

    by

    canvas = stamper.GetOverContent(pageno);
    
    PdfGState state = new PdfGState();
    state.FillOpacity = .3f;
    canvas.SetGState(state);
    

    Selecting the word "support" on the third document page we get:

    The yellow is quite pale here.

    Using an Opacity value of .6 instead we get

    Now the yellow is more intense but the text starts to pale out.

    For tasks like this I actually prefer using the blend mode Darken. This can be done by using

    state.BlendMode = new PdfName("Darken");
    

    instead of state.FillOpacity = .3f. This results in

    This IMO looks better.

    How the client did it

    The OP commented

    Client have given a pdf. In that, they highlighted text, the highlighted text is displayed in browser

    The client's PDF actually uses annotations, just like the OP in his original code, but in contrast each of the client's annotations contains an appearance stream which the highlight annotations generated by iText don't.

    Supplying an appearance is optional and PDF viewers indeed should generate an appearance if none is given. Obviously, though, there are numerous PDF viewers which rely on appearances the PDF brings along.

    By the way, the appearances in the client's PDF actually use the blend mode Multiply. For underlying white and black colors, Darken and Multiply have the same result.

    Making it work with annotations

    In a comment the OP wondered

    Please one more doubt, if the user wrongly highlighted then how to remove yellow color(or change yellow to white)? i changed yellow to white but it's not working. canvas.SetColorFill(BaseColor.WHITE);

    Undoing a change to the page content generally is more difficult than undoing the addition of an annotation. Thus, let's make the OP's original code also work, i.e. adding an appearance stream to the highlight annotations.

    As the OP reported in another comment, his first attempt to add an appearance stream failed:

    PdfAppearance appearance = PdfAppearance.CreateAppearance(stamper.Writer, rect.Width, rect.Height);
    appearance.Rectangle(rect.Left, rect.Bottom, rect.Width, rect.Height);
    appearance.SetColorFill(BaseColor.WHITE);
    appearance.Fill();
    highlight.SetAppearance( PdfAnnotation.APPEARANCE_NORMAL, appearance );
    stamper.AddAnnotation(highlight, pageno);
    

    but it's not working.

    The problems in his attempt are:

    • The origin of the appearance template is in the lower left corner of the annotation area, not of the page. To color the area in question, therefore, the rectangle must have its lower left at (0, 0).
    • Strictly speaking the color must be set before starting the path building.
    • A different color than white should be used for highlighting.
    • Transparency or an appropriate rendering mode should be used to allow the original, marked text to shine through.

    Thus, the following code shows how to do it.

    private void highlightPDFAnnotation(string outputFile, string highLightFile, int pageno, string[] splitText)
    {
        PdfReader reader = new PdfReader(outputFile);
        iTextSharp.text.pdf.PdfContentByte canvas;
        using (FileStream fs = new FileStream(highLightFile, FileMode.Create, FileAccess.Write, FileShare.None))
        {
            using (PdfStamper stamper = new PdfStamper(reader, fs))
            {
                myLocationTextExtractionStrategy strategy = new myLocationTextExtractionStrategy();
                strategy.UndercontentHorizontalScaling = 100;
    
                string currentText = PdfTextExtractor.GetTextFromPage(reader, pageno, strategy);
                for (int i = 0; i < splitText.Length; i++)
                {
                    List<iTextSharp.text.Rectangle> MatchesFound = strategy.GetTextLocations(splitText[i].Trim(), StringComparison.CurrentCultureIgnoreCase);
                    foreach (Rectangle rect in MatchesFound)
                    {
                        float[] quad = { rect.Left - 3.0f, rect.Bottom, rect.Right, rect.Bottom, rect.Left - 3.0f, rect.Top + 1.0f, rect.Right, rect.Top + 1.0f };
                        //Create our hightlight
                        PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);
                        //Set the color
                        highlight.Color = BaseColor.YELLOW;
    
                        PdfAppearance appearance = PdfAppearance.CreateAppearance(stamper.Writer, rect.Width, rect.Height);
                        PdfGState state = new PdfGState();
                        state.BlendMode = new PdfName("Multiply");
                        appearance.SetGState(state);
                        appearance.Rectangle(0, 0, rect.Width, rect.Height);
                        appearance.SetColorFill(BaseColor.YELLOW);
                        appearance.Fill();
    
                        highlight.SetAppearance(PdfAnnotation.APPEARANCE_NORMAL, appearance);
    
                        //Add the annotation
                        stamper.AddAnnotation(highlight, pageno);
                    }
                }
            }
        }
        reader.Close();
    }
    

    These annotation are displayed by Chrome, too, and as annotations they can easily be removed.

    0 讨论(0)
提交回复
热议问题