Forcing POS tags in Stanford CoreNLP

故事扮演 提交于 2019-12-12 02:08:47

问题


Is there a way to process an already POS-tagged text using Stanford CoreNLP?

For example, I have the sentence in this format

They_PRP are_VBP hunting_VBG dogs_NNS ._.

and I'd like to annotate with lemma, ner, parse, etc. by forcing the given POS annotation.

Update. I tried this code, but it's not working.

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma"); 

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String sentText = "They_PRP are_VBP hunting_VBG dogs_NNS ._.";
List<CoreLabel> sentence = new ArrayList<>();

String[] parts = sentText.split("\\s");
for (String p : parts) {
    String[] split = p.split("_");
    CoreLabel clToken = new CoreLabel();
    clToken.setValue(split[0]);
    clToken.setWord(split[0]);
    clToken.setOriginalText(split[0]);
    clToken.set(CoreAnnotations.PartOfSpeechAnnotation.class, split[1]);
    sentence.add(clToken);
}
Annotation s = new Annotation(sentText);
s.set(CoreAnnotations.TokensAnnotation.class, sentence);

Annotation document = new Annotation(s);
pipeline.annotate(document);

回答1:


The POS annotations will certainly be replaced if you include the pos annotator in the pipeline.

Instead, remove the pos annotator and add the option -enforceRequirements false. This will allow the pipeline to run even though an annotator which lemma, etc. depend on (the pos annotator) is not present. Add the following line before pipeline instantiation:

props.setProperty("enforceRequirements", "false");

Of course, behavior is undefined if you venture into this area without setting the proper annotations, so make sure you match the annotations made by the relevant annotator (POSTaggerAnnotator in this case).



来源:https://stackoverflow.com/questions/29518946/forcing-pos-tags-in-stanford-corenlp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!