Read annotated data from GATE datastore

感情迁移 提交于 2019-12-08 02:50:34

问题


I use GATE for manually annotating a large amount of texts by its contained emotions. To further process this text, I like to export that out of the datastore into my own Java application. I didn't found documentation about how to do that. I already wrote a program to import data into the datastore, but I don't have an idea how to get the annotated out of the datastore. I also tried to open the lucene based datastore using Luke (https://code.google.com/p/luke/). It's a tool, that is able to read a Lucene index. But it was not possible to open the Gate Lucene datastore using that tool :( Does anyone has an idea how to read the annotated text from the datastore?


回答1:


You can use GATE APIs to load the documents from the datastore and then export them as GATE XML in the normal way (imports and exception handling omitted):

Gate.init();
DataStore ds = Factory.openDataStore("gate.creole.annic.SearchableDataStore", "file:/path/to/datastore");
List docIds = ds.getLrIds("gate.corpora.DocumentImpl");
for(Object id : docIds) {
  Document d = (Document)Factory.createResource("gate.corpora.DocumentImpl",
            gate.Utils.featureMap(DataStore.DATASTORE_FEATURE_NAME, ds,
                                  DataStore.LR_ID_FEATURE_NAME, id));
  try {
    File outputFile = new File(...); // based on doc name, sequential number, etc.
    DocumentStaxUtils.writeDocument(d, outputFile);
  } finally {
    Factory.deleteResource(d);
  }
}

If you want to write the annotations as inline XML then replace DocumentStaxUtils.writeDocument with something like

Set<String> types = new HashSet<String>();
types.add("Person");
types.add("Location"); // and whatever others you're interested in
FileUtils.write(outputFile, d.toXml(d.getAnnotations().get(types), true));

(I'm using FileUtils from Apache commons-io for convenience but you could equally handle opening and closing the file yourself).



来源:https://stackoverflow.com/questions/19677972/read-annotated-data-from-gate-datastore

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!