I am trying to read a Microsoft Word 2003 Document (.doc) using poi-scratchpad-3.8 (HWPF). I need to either read the file word by word, or character by character. Either way i
I would suggest that you take a look at the sourcecode to WordExtractor from Apache Tika, as it's a great example of getting text and styling from a Word document using Apache POI
Based on what you did and didn't say in your question, I suspect you're looking for something a little like this:
Range r = document.getRange();
for(int i=0; ip.getStyleIndex()) {
StyleDescription style =
document.getStyleSheet().getStyleDescription(p.getStyleIndex());
String styleName = style.getName();
System.out.println(styleName + " -> " + text);
}
else {
// Text has an unknown or invalid style
}
}
For anything more advanced, take a look at the WordExtractor sourcecode and see what else you can do with this sort of thing!