iTextSharp works well extracting plain text from PDF documents, but I\'m having trouble with subscript/superscript text, common in technical documents.
TextChu
I just solved a similar problem, see my question. I detect subscripts as text that have a baseline between the Ascending and Descending lines of the preceding text. This snipped of code might be usefull:
Vector thisFacade = this.ascentLine.GetStartPoint().Subtract(this.descentLine.GetStartPoint());
Vector infoFacade = renderInfo.GetAscentLine().GetStartPoint().Subtract(renderInfo.GetDescentLine().GetStartPoint());
if (baseVector.Cross(ascent2base).Dot(baseVector.Cross(descent2base)) < 0
&& infoFacade.LengthSquared < thisFacade.LengthSquared - sameHeightThreshols)
More details after Chistmass.