Parsing either font style or block of paragraph in GATE

半腔热情 提交于 2019-12-24 12:47:13

问题


I have a word document. I need to match particular table section or heading section of it using GATE. I thought if there were any steps from where we can first check any font size or font style of the heading and then match rest of the content till next heading pattern repeats.


回答1:


GATE has only a limited support for MS Word documents provided by the Apache Tika and Apache POI libraries. I do not know about any free alternative... We have developed our own plugin (gate.DocumentFormat) for this purpose in my company, but it is not available for the outside by now.

You can try to convert your word documents to HTML by some other tool (e.g. using directly the MS Word, OpenOffice, docx4j or others - try google docx to html -- you will see many results) and then process the HTML documents in GATE instead. You will see all the formatting available in the Original markups annotation set.



来源:https://stackoverflow.com/questions/33255580/parsing-either-font-style-or-block-of-paragraph-in-gate

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!