How to know the Image or Picture Location while parsing MS Word Doc in java using apache poi

。_饼干妹妹 提交于 2019-12-01 14:39:22

You're getting at the pictures the wrong way, which is why you're not finding any positions!

What you need to do is process each CharacterRun of the document in turn. Pass that to the PicturesTable, and check if the character run has a picture in. If it does, fetch back the picture from the table, and you know where in the document it belongs as you have the run it comes from

At the simplest, it'd be something like:

PicturesSource pictures = new PicturesSource(document);
PicturesTable pictureTable = document.getPicturesTable();

Range r = document.getRange();
for(int i=0; i<r.numParagraphs(); i++) {
    Paragraph p = r.getParagraph(i);
    for(int j=0; j<p.numCharacterRuns(); j++) {
      CharacterRun cr = p.getCharacterRun(j);
      if (pictureTable.hasPicture(cr)) {
         Picture picture = pictures.getFor(cr);
         // Do something useful with the picture
      }
    }
}

You can find a good example of doing this in the Apache Tika parser for Microsoft Word .doc, which is powered by Apache POI

You Should add PicturesSourceClass

public class PicturesSource {

private PicturesTable picturesTable;
private Set<Picture> output = new HashSet<Picture>();
private Map<Integer, Picture> lookup;
private List<Picture> nonU1based;
private List<Picture> all;
private int pn = 0;

public PicturesSource(HWPFDocument doc) {
    picturesTable = doc.getPicturesTable();
    all = picturesTable.getAllPictures();


    lookup = new HashMap<Integer, Picture>();
    for (Picture p : all) {
        lookup.put(p.getStartOffset(), p);
    }


    nonU1based = new ArrayList<Picture>();
    nonU1based.addAll(all);
    Range r = doc.getRange();
    for (int i = 0; i < r.numCharacterRuns(); i++) {
        CharacterRun cr = r.getCharacterRun(i);
        if (picturesTable.hasPicture(cr)) {
            Picture p = getFor(cr);
            int at = nonU1based.indexOf(p);
            nonU1based.set(at, null);
        }
    }
}


private boolean hasPicture(CharacterRun cr) {
    return picturesTable.hasPicture(cr);
}

private void recordOutput(Picture picture) {
    output.add(picture);
}

private boolean hasOutput(Picture picture) {
    return output.contains(picture);
}

private int pictureNumber(Picture picture) {
    return all.indexOf(picture) + 1;
}

public Picture getFor(CharacterRun cr) {
    return lookup.get(cr.getPicOffset());
}


private Picture nextUnclaimed() {
    Picture p = null;
    while (pn < nonU1based.size()) {
        p = nonU1based.get(pn);
        pn++;
        if (p != null) return p;
    }
    return null;
}

}

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!