How to determine artificial bold style ,artificial italic style and artificial outline style of a text using PDFBOX

后端未结

关注

 2  826

花落未央 2020-11-28 16:41

I am using PDFBox for validating a pdf document . There are certain requirement to check following types of text present in a PDF

Artificial Bold style text

2条回答

余生分开走 (楼主)

2020-11-28 16:53

My solution for this problem was to create a new class that extends the PDFTextStripper class and overrides the function:

getCharactersByArticle()

note: PDFBox version 1.8.5

CustomPDFTextStripper class

public class CustomPDFTextStripper extends PDFTextStripper
{
    public CustomPDFTextStripper() throws IOException {
    super();
    }

    public Vector> getCharactersByArticle(){
    return charactersByArticle;
    }
}

This way i can parse the pdf document and then get the TextPosition from a custom extraction function:

 private void extractTextPosition() throws FileNotFoundException, IOException {

    PDFParser parser = new PDFParser(new FileInputStream(pdf));
    parser.parse();
    StringWriter outString = new StringWriter();
    CustomPDFTextStripper stripper = new CustomPDFTextStripper();
    stripper.writeText(parser.getPDDocument(), outString);
    Vector> vectorlistoftps = stripper.getCharactersByArticle();
    for (int i = 0; i < vectorlistoftps.size(); i++) {
        List tplist = vectorlistoftps.get(i);
        for (int j = 0; j < tplist.size(); j++) {
            TextPosition text = tplist.get(j);
            System.out.println(" String "
          + "[x: " + text.getXDirAdj() + ", y: "
          + text.getY() + ", height:" + text.getHeightDir()
          + ", space: " + text.getWidthOfSpace() + ", width: "
          + text.getWidthDirAdj() + ", yScale: " + text.getYScale() + "]"
          + text.getCharacter());
        }       
    }
}

TextPositions contain numerous information about the characters of the pdf document.

OUTPUT:

String [x: 168.24, y: 64.15997, height:6.061287, space: 8.9664, width:3.4879303, yScale: 8.9664]J

String [x: 171.69745, y: 64.15997, height:6.061287, space: 8.9664, width: 2.2416077, yScale:8.9664]N

String [x: 176.25777, y: 64.15997, height:6.0343876, space: 8.9664,width: 6.4737396, yScale:8.9664]N

String [x: 182.73778, y:64.15997, height:4.214208, space: 8.9664, width: 3.981079, yScale: 8.9664]e .....

0 讨论(0)

查看其它2个回答