Superscript and subscript differentiation using pdf box

拜拜、爱过 提交于 2019-12-05 14:49:15
Ritz

Was able to identify most superscripts by looking for Y and Height changes. Try this:

Write your own implementation of PDFTextStripper.

Add this to writePage() to convert superscripts into separate words:

if((position.getY() < lastPosition.getTextPosition().getY()
    && position.getHeight() < lastPosition.getTextPosition().getHeight())
    || (position.getY() > lastPosition.getTextPosition().getY()
    && position.getHeight() > lastPosition.getTextPosition().getHeight()))
                    line.add(WordSeparator.getSeparator());

Then add this to writeLine() to add a tag before or after Superscripts:

        if(word.textPositions.size()>0)
        {
        TextPosition firstChar = word.textPositions.get(0);

        if(i==0)
        {
            prevY = firstChar.getY();
            prevHeight = firstChar.getHeight();
        }

        if(prevY!=0)
        {
        if((firstChar.getY() < prevY && firstChar.getHeight() < prevHeight))
        {
    output.write("<sup>");
    writeString(word.getText(), word.getTextPositions());

        }
        else if((firstChar.getY() > prevY && firstChar.getHeight() > prevHeight))
        {
             output.write("</sup>");
             writeString(word.getText(), word.getTextPositions());

        }
        else
            writeString(word.getText(), word.getTextPositions());
        }
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!