问题
I am testing out getting entity mentions the new 3.9.0 way with CoreEntityMention. I do something like:
CoreDocument document = new CoreDocument(text);
stanfordPipe = createNerPipeline();
stanfordPipe.annotate(document);
for (CoreSentence sentence : document.sentences()) {
logger.debug("Found sentence {}", sentence);
if (sentence.entityMentions() == null) continue;
for (CoreEntityMention cem : sentence.entityMentions()) {
logger.debug("Found em {}", stringify(cem));
}
}
When I iterate through entity mentions using sentence.entityMentions()
I see that some of the entity mentions produced are multi-token entity mentions. The old way of getting entity mentions, and correct me if I am wrong, is that you have to iterate over CoreLabel and therefore have to combine the multi-token entity mentions yourself.
So is there some new method that did not exist before to combine adjacent tokens with the same ner label? Or have I missed older ways to combine multi-token entity mentions?
回答1:
Hi thanks for using the new interface!
Yes, the CoreEntityMention is supposed to represent a full entity mention. This was some new syntax added to help make it easier to work with our code.
Traditionally there has been a need for things like sentence.get(CoreAnnotations.TokensAnnotation.class)...etc...so we tried to add some wrapper classes so people could use the pipeline interface but not have the cumbersome syntax.
With this newly debuted syntax, you can write:
sentence.tokens();
Regarding entity mentions, if the sentence is "Joe Smith went to Hawaii." you would get two entity mentions:
Joe Smith (2 tokens) Hawaii (1 token)
Traditionally the ner
annotator would tag every token in the sentence with it's named entity type. Then a separate entitymentions
annotator would build Mention
annotations which were CoreMap
representations of full entity mentions (e.g. Joe Smith).
I've seen a lot of people over the years ask "How do I go from a tagged sequence of tokens to the full entity mentions?" So in response to this we tried to make it a lot easier to just extract the full entity's referred to in the sentence.
I should also note that for the most part the older ways should still work. Updated documentation is on the way as we work on finalizing the 3.9.0 release!
来源:https://stackoverflow.com/questions/48632256/stanford-nlp-3-9-0-does-using-coreentitymention-combine-adjacent-entity-mention