How can I get Images coordinates in pdf into JSONfile?

吃可爱长大的小学妹 提交于 2019-12-06 06:30:43

I was able to find images with searching for cm operator. I overrided PDFTextStripper the following way: Note: it doesn't take into account rotation and mirroring!

public static class TextFinder extends PDFTextStripper {

    public TextFinder() throws IOException {
        super();
    }

    @Override
    protected void startPage(PDPage page) throws IOException {
        // process start of the page
        super.startPage(page);
    }

    @Override
    public void process(PDFOperator operator, List<COSBase> arguments)
            throws IOException {

        if ("cm".equals(operator.getOperation())) {
            float width = ((COSNumber)arguments.get(0)).floatValue();
            float height = ((COSNumber)arguments.get(3)).floatValue();
            float x = ((COSNumber)arguments.get(4)).floatValue();
            float y = ((COSNumber)arguments.get(5)).floatValue();
            // process image coordinates
        }
        super.processOperator(operator, arguments);
    }

    @Override
    protected void writeString(String text,
            List<TextPosition> textPositions) throws IOException {
        for (TextPosition position : textPositions) {
            // process text coordinates
        }
        super.writeString(text, textPositions);
    }
}

Of course, one can use PDFStreamEngine instead of PDFTextStripper, if one is not interested in finding text together with images.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!