Stanford NLP 3.9.0: Does using CoreEntityMention combine adjacent entity mentions?

问题

I am testing out getting entity mentions the new 3.9.0 way with CoreEntityMention. I do something like:

    CoreDocument document = new CoreDocument(text);
    stanfordPipe = createNerPipeline();
    stanfordPipe.annotate(document);

    for (CoreSentence sentence : document.sentences()) {
        logger.debug("Found sentence {}", sentence);
        if (sentence.entityMentions() == null) continue;
        for (CoreEntityMention cem : sentence.entityMentions()) {
            logger.debug("Found em {}", stringify(cem));            
        }
    }

When I iterate through entity mentions using sentence.entityMentions() I see that some of the entity mentions produced are multi-token entity mentions. The old way of getting entity mentions, and correct me if I am wrong, is that you have to iterate over CoreLabel and therefore have to combine the multi-token entity mentions yourself.

So is there some new method that did not exist before to combine adjacent tokens with the same ner label? Or have I missed older ways to combine multi-token entity mentions?

回答1:

Hi thanks for using the new interface!

Yes, the CoreEntityMention is supposed to represent a full entity mention. This was some new syntax added to help make it easier to work with our code.

Traditionally there has been a need for things like sentence.get(CoreAnnotations.TokensAnnotation.class)...etc...so we tried to add some wrapper classes so people could use the pipeline interface but not have the cumbersome syntax.

With this newly debuted syntax, you can write:

sentence.tokens();

Regarding entity mentions, if the sentence is "Joe Smith went to Hawaii." you would get two entity mentions:

Joe Smith (2 tokens) Hawaii (1 token)

Traditionally the ner annotator would tag every token in the sentence with it's named entity type. Then a separate entitymentions annotator would build Mention annotations which were CoreMap representations of full entity mentions (e.g. Joe Smith).

I've seen a lot of people over the years ask "How do I go from a tagged sequence of tokens to the full entity mentions?" So in response to this we tried to make it a lot easier to just extract the full entity's referred to in the sentence.

I should also note that for the most part the older ways should still work. Updated documentation is on the way as we work on finalizing the 3.9.0 release!

来源：https://stackoverflow.com/questions/48632256/stanford-nlp-3-9-0-does-using-coreentitymention-combine-adjacent-entity-mention

标签

stanford-nlp

named-entity-recognition