How can I extract subscript / superscript properly from a PDF using iTextSharp?

后端 未结 2 1981
-上瘾入骨i
-上瘾入骨i 2020-11-29 13:35

iTextSharp works well extracting plain text from PDF documents, but I\'m having trouble with subscript/superscript text, common in technical documents.

TextChu

2条回答
  •  我在风中等你
    2020-11-29 13:59

    I just solved a similar problem, see my question. I detect subscripts as text that have a baseline between the Ascending and Descending lines of the preceding text. This snipped of code might be usefull:

            Vector thisFacade = this.ascentLine.GetStartPoint().Subtract(this.descentLine.GetStartPoint());
            Vector infoFacade = renderInfo.GetAscentLine().GetStartPoint().Subtract(renderInfo.GetDescentLine().GetStartPoint());
            if (baseVector.Cross(ascent2base).Dot(baseVector.Cross(descent2base)) < 0
                && infoFacade.LengthSquared < thisFacade.LengthSquared - sameHeightThreshols)
    

    More details after Chistmass.

提交回复
热议问题