How to extract formatting information of word document using Apache POI?

孤人 提交于 2019-12-24 03:43:23

问题


I am using Apache POI for extracting formatting information from MS word files.

I want to extract information like whether paragraph is having bullet, background color, forecolor, alignment, etc.

There is not much documentation or tutorials available for this. Javadoc also does not contain much helpful information.

Where can I get tutorials/good documentation which can help me in learning Apache POI API??


回答1:


For HWPF (.doc), the classes you probably want are:

  • http://poi.apache.org/apidocs/org/apache/poi/hwpf/usermodel/ParagraphProperties.html
  • http://poi.apache.org/apidocs/org/apache/poi/hwpf/usermodel/CharacterProperties.html
  • http://poi.apache.org/apidocs/org/apache/poi/hwpf/model/StyleDescription.html

Depending on the exact property you want, it may be on the paragraph or the character properties.

The best example I can think of for reading a word document with HWPF and getting text, checking styles and formatting etc is WordExtractor from Apache Tika: https://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java

(XWPF for .docx is similar)



来源:https://stackoverflow.com/questions/5456027/how-to-extract-formatting-information-of-word-document-using-apache-poi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!