Configuring SUTime to use custom rule files

社会主义新天地 提交于 2021-01-28 04:47:19

问题


I am trying to configure SUTime annotator (part of "ner") to use my own date/time rule files INSTEAD of the out-of-the-box rule files that are located in "models/sutime/" in the distribution JAR for Stanford CoreNLP models.

The reason for me doing that is that I want to slightly modify what SUTime rules are doing.

According to the official SUTime documentation, all it takes is specifying the "sutime.rules" property in the form of comma-separated file paths.
But after I did that, it appears that CoreNLP still takes the out-of-the-box rule files:

Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt

I tried the absolute paths and the paths relative to my project root - still the same effect.
It appears that, contrary to the documentation, the "sutime.rules" property is simply getting ignored.

Any help will be greatly appreciated.

UPDATE:

The workaround in the form of:

  1. turning off SUTime as a part of the "ner" step
  2. copying its rule files and modifying them as necessary
  3. creating a custom annotator based on the TimeAnnotator class and adding it to the pipeline
  4. setting the .rules properties to the modified rule files

does not work.
The pipeline runs, but the functionality is not the same. The TimeAnnotator constructor needs to be invoked with the "sutime" parameterin order for its functionality to be exactly the same as if it was being called in the "ner" step.
This cannot be done via properties, it seems.


回答1:


Thank you for letting us know that this is not working. We will look into this and fix it for the next release. If you do need to change the rules files slightly, you can try to place your own copy of edu/stanford/nlp/models/sutime/english.sutime.txt in the classpath before the CoreNLP models jar.




回答2:


I too had a need to override the english.sutime.txt file. I accomplished this by creating an NERClassifierCombiner and using that when instantiating the NERCombinerAnnotator. Pseudo code:

Properties nerProps = new Properties();
nerProps.put("sutime.rules", "your new comma separated file list");
Set<String> passDownProps = Generics.newHashSet();
passdownProps.addAll(NERClassifierCombiner.DEFAULT_PASS_DOWN_PROPERTIES);
passdownProps.add("sutime.rules");
NERClassifierCombiner combiner = NERClassifierCombiner.createNERClassifierCombiner("giveItAName", passdownProps, nerProps);
NERCombinerAnnotator nerAnnotator = new NERCombinerAnnotator(combiner, false);

Hope that helps.



来源:https://stackoverflow.com/questions/31970286/configuring-sutime-to-use-custom-rule-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!